One common question I see from developers using Git is how they can review the history of one function, method, or class over time through Git's history of the project.
In codebases that have evolved over years, a developer just wants to know how one particular semantic scope of code has changed over time rather than on a file or directory basis.
We will start out by revisiting how to scope change diffs per file.
We will use the Ruby language repository to demonstrate the commands in this blog post, so please clone the repository like so:
$ git clone https://github.com/ruby/ruby.git
Cloning into 'ruby'...
remote: Enumerating objects: 91, done.
remote: Counting objects: 100% (91/91), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 488812 (delta 25), reused 63 (delta 7), pack-reused 488721
Receiving objects: 100% (488812/488812), 229.77 MiB | 2.78 MiB/s, done.
Resolving deltas: 100% (375083/375083), done.
Scoping log diffs per file
Sometimes a developer only wants to look at changes in one specific
file in the repository. To do this we would use the git-log
command:
$ git log -- README.md
commit 459670d47f8528db8f5d4f28aeac191b1af66d81
Author: David Rodríguez <deivid.rodriguez@riseup.net>
Date: Sun Mar 8 10:21:18 2020 +0100
Fix bundled gems installation on a fresh clone
commit adc303131187654d8ce83f3db17eefa3d5bae26c
Author: Kazuhiro NISHIYAMA <zn@mbf.nifty.com>
Date: Sat Feb 1 00:36:58 2020 +0900
README*.md: `defines.h` moved [ci skip]
at 2b592580bf65040373b55ff2ccc3b59a0a231a18
commit 2d61684e7c334ae4c5eb845c782d5fabeffdea67
Author: Nobuyoshi Nakada <nobu@ruby-lang.org>
Date: Sun Jan 19 21:15:23 2020 +0900
README.md: removed the badge for Cygwin [ci skip]
The workflow for Cygwin has been removed at
3344f811074e1e6119eec23684013457dab4f8b0.
commit 1a1862236da60e21e51c66543e89bf577b6ed14a
Author: Kazuhiro NISHIYAMA <zn@mbf.nifty.com>
Date: Wed Jan 1 00:02:01 2020 +0900
Update GitHub Actions Badges
[TRUNCATED]
This will show only the log message and metadata about commits that contain changes in that file.
Scoping diffs in a line range of a file
In many projects each source file has a predefined documentation header and we only want to find the change that introduced an inconsistency in the documentation header of a particular file.
To find this we might do the following in our ruby repository:
$ git log -L 1,9:vm.c
commit 79df14c04b452411b9d17e26a398e491bca1a811
Author: Koichi Sasada <ko1@atdot.net>
Date: Tue Mar 10 02:22:11 2020 +0900
Introduce Ractor mechanism for parallel execution
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.
I hope this feature can help programmers from thread-safety issues.
diff --git a/vm.c b/vm.c
--- a/vm.c
+++ b/vm.c
@@ -1,9 +1,9 @@
/**********************************************************************
- vm.c -
+ Vm.c -
$Author$
Copyright (C) 2004-2007 Koichi Sasada
**********************************************************************/
commit 6cdef2dc7e8a4098727de5befff8b2496fa71430
Author: akr <akr@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Sun Jan 6 15:49:38 2008 +0000
* $Date$ keyword removed to avoid inclusion of locale dependent
string.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
diff --git a/vm.c b/vm.c
--- a/vm.c
+++ b/vm.c
@@ -1,10 +1,9 @@
/**********************************************************************
vm.c -
$Author$
- $Date$
Copyright (C) 2004-2007 Koichi Sasada
[TRUNCATED]
This will show all commits containing changes in lines 1 through 9
inclusive in the file vm.c
along with patch diff output for that
part of the file.
Scoping diffs by named block in a file
In large files or when blocks (such as functions, methods, or classes) of code have been moved around the file, we might want to limit change log noise especially when that file is updated regularly. A typical example in a Ruby on Rails application might be an action method in a controller.
Let's consult the man page for git-log
like so:
$ man git-log
We eventually come across a part like the following:
-L <start>,<end>:<file>, -L :<funcname>:<file>
Trace the evolution of the line range given by "<start>,<end>" (or the
function name regex <funcname>) within the <file>. You may not give any
pathspec limiters. This is currently limited to a walk starting from a
single revision, i.e., you may only give zero or one positive revision
arguments, and <start> and <end> (or <funcname>) must exist in the starting
revision. You can specify this option more than once. Implies --patch. Patch
output can be suppressed using --no-patch, but other diff formats (namely
--raw, --numstat, --shortstat, --dirstat, --summary, --name-only,
--name-status, --check) are not currently implemented.
<start> and <end> can take one of these forms:
• number
If <start> or <end> is a number, it specifies an absolute line number
(lines count from 1).
• /regex/
This form will use the first line matching the given POSIX regex. If
<start> is a regex, it will search from the end of the
previous -L range, if any, otherwise from the start of file. If <start>
is “^/regex/”, it will search from the start of file. If
<end> is a regex, it will search starting at the line given by <start>.
• +offset or -offset
This is only valid for <end> and will specify a number of lines before
or after the line given by <start>.
If “:<funcname>” is given in place of <start> and <end>, it is a regular
expression that denotes the range from the first funcname line that matches
<funcname>, up to the next funcname line. “:<funcname>” searches from the
end of the previous -L range, if any, otherwise from the start of file.
“^:<funcname>” searches from the start of file.
Ok, we have already seen how to list the relevant log entries with
patches for a line range in a file (in the section above) and now want
to take advantage of the form -L :<funcname>:<file>
.
To look at all changes in the main
function of the
ext/nkf/nkf-utf8/nkf.c
file in the ruby repository we would issue
the following command:
$ git log -L :main:ext/nkf/nkf-utf8/nkf.c
Cool, then armed with this new power we should be able to look at commits and relevant patches within a Ruby function too, right? Let's give that a try:
$ git log -L :request_uri:lib/uri/http.rb
commit 107ba65fba13bdf791e5dae0305c5768e6f7d122
Author: hsbt <hsbt@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Fri Sep 30 10:06:24 2016 +0000
* lib/uri/http.rb: Documentation and code style imrovements.
* test/uri/test_http.rb: Added test for coverage.
[fix GH-1427][ruby-core:77255][Misc #12756]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56298 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
diff --git a/lib/uri/http.rb b/lib/uri/http.rb
--- a/lib/uri/http.rb
+++ b/lib/uri/http.rb
@@ -98,12 +102,11 @@
def request_uri
- return nil unless @path
- if @path.start_with?(?/.freeze)
- @query ? "#@path?#@query" : @path.dup
- else
- @query ? "/#@path?#@query" : "/#@path"
- end
+ return unless @path
+
+ url = @query ? "#@path?#@query" : @path.dup
+ url.start_with?(?/.freeze) ? url : ?/ + url
end
end
@@schemes['HTTP'] = HTTP
+
end
commit a5c923f6c1ab0ddd68c4debb7c68623ff0cf4e6a
Author: naruse <naruse@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Tue Aug 5 19:09:01 2014 +0000
* lib/uri/http.rb (URI::HTTP#request_uri): optimized.
decrease object allocation, and ensure always create at least one new
object for return value.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
diff --git a/lib/uri/http.rb b/lib/uri/http.rb
--- a/lib/uri/http.rb
+++ b/lib/uri/http.rb
@@ -95,12 +95,12 @@
def request_uri
- r = path_query
- if r && r[0] != ?/
- r = '/' + r
+ return nil unless @path
+ if @path.start_with?(?/.freeze)
+ @query ? "#@path?#@query" : @path.dup
+ else
+ @query ? "/#@path?#@query" : "/#@path"
end
-
[TRUNCATED]
This works but you will notice some of the patches show changed lines outside of the method block.
How does this work?
One key observation is that in the root of the ruby
repository is a
file named .gitattributes
. This can do many things but for the
purposes of block-based git logs and patch review, the important line
that made the above command mostly work is the following:
*.rb diff=ruby
This is telling Git to assume the file type for all files ending in
and rb
extension is ruby
. For diffing purposes this uses a
regex to determine the block boundaries for ruby
files:
This identifies named class, module, function, or method definitions
as being named blocks. The start of the regular expression provides
looks for spaces or tabs preceding either a class
, module
, or
def
keyword followed by a space or tag again.
The way the -L :funcname:file
argument to git log
subcommand works
is it will find the named marker matching that regex until the next
named marker and this is why we don't just see changes within the
request_uri
method definition in the example in the parent section.
For most purposes this is good enough for quick and dirty filtering of noise from git logs.
Tracking changes in markdown document sections
Now let us say we want to see a log of all commits that changed the
section 'Features of Ruby' in the README.md
file at the root of the
ruby
repository.
Let us give that a try:
$ git log -L :Features\ of\ Ruby:README.md
This gives me a rather nasty error like so:
fatal: -L parameter 'Features of Ruby' starting at line 1: no match
Not the best error message but based on the last subsection ('How does
this work?') I have a hunch. Let's find where in the .gitattributes
that it specifies that README.md is a markdown file:
$ grep markdown .gitattributes
It shows me nothing. We need to tell Git to assume that all *.md
files are of type markdown
which we can do by adding the following line:
*.md diff=markdown
Retrying the git log
command above will show us only commits and
their patches that contain changes to that section of the markdown
file README.md
as expected now:
$ git log -L :Features\ of\ Ruby:README.md
commit dbe834ab5ac4f90df5db9fc314b45890726cca3b
Author: Takashi Kokubun <takashikkbn@gmail.com>
Date: Mon Jul 1 01:04:40 2019 +0900
Prefer master rather than trunk in README [ci skip]
diff --git a/README.md b/README.md
--- a/README.md
+++ b/README.md
@@ -13,15 +13,15 @@
## Features of Ruby
* Simple Syntax
* **Normal** Object-oriented Features (e.g. class, method calls)
* **Advanced** Object-oriented Features (e.g. mix-in, singleton-method)
* Operator Overloading
* Exception Handling
* Iterators and Closures
* Garbage Collection
* Dynamic Loading of Object Files (on some architectures)
* Highly Portable (works on many Unix-like/POSIX compatible platforms as
well as Windows, macOS, Haiku, etc.) cf.
- https://github.com/ruby/ruby/blob/trunk/doc/contributing.rdoc#platform-maintainers
+ https://github.com/ruby/ruby/blob/master/doc/contributing.rdoc#platform-maintainers
commit 4fb5888a4dbc10b6f6d3f847f680baae60b9f757
Author: kazu <kazu@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Fri Jun 15 00:19:05 2018 +0000
Update obsoleted URLs of supported platforms [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
diff --git a/README.md b/README.md
--- a/README.md
+++ b/README.md
@@ -11,15 +11,15 @@
## Features of Ruby
* Simple Syntax
* **Normal** Object-oriented Features (e.g. class, method calls)
* **Advanced** Object-oriented Features (e.g. mix-in, singleton-method)
* Operator Overloading
* Exception Handling
* Iterators and Closures
* Garbage Collection
* Dynamic Loading of Object Files (on some architectures)
* Highly Portable (works on many Unix-like/POSIX compatible platforms as
well as Windows, macOS, Haiku, etc.) cf.
- https://bugs.ruby-lang.org/projects/ruby-trunk/wiki/SupportedPlatforms
+ https://github.com/ruby/ruby/blob/trunk/doc/contributing.rdoc#platform-maintainers
commit f4ae225b04ae0cde3aa2781c82875074da49086b
[TRUNCATED]
Defining new named blocks for new formats and file types
Now what happens if I wanted to write my documentation in orgmode
format instead of markdown like all good emacsers?
Let us try the following:
- We will add an entry to
.gitattributes
file to tell Git to treat files matching the pattern*.org
asorg
files. - Write orgmode files over multiple commits changing parts of different sections.
- Try the
git log -L :<funcname>:<filename>
command like above.
Unfortunately this alone will not work. What we must also do is open
up our user ~/.gitconfig
and add the following to the =[diff
"org"] section.
[diff "org"]
xfuncname = "^ *\*{1,6}[ \t].*"
Now if we try it we will see what we are looking for.
As an exercise you could try building a regular expression for a file
format that git doesn't automatically recognize how to find named
blocks for and adding the xfuncname
attribute under the relevant
diff configuration section of your Git config file.
Limitations
One big limitation of this last approach approach is that it is based
on the name of the block given by the regular expression in
xfuncname
in the relevant diff config section. It means that if the
name of the block changed over time that will not be included in the output.
Two related options for git-log
includes:
-S <TERM>
: which searches for the specified string in the patch-G <REGEX>
: which searches for the regular expression in the patch
I have the following git aliases defined for each:
[alias]
# ... truncated
search = "log --all --pretty=oneline -S"
egrep = "log --all --pretty=oneline -G"
Then I can use git egrep "^\s*module\s+"
to search for all commits
that contain something that resembles a module declaration in Ruby.
Again note that this is just a quick-n-dirty way to eliminate noise and for many use cases this is enough, but we should dream about a more semantic world.
If you enjoyed this content, please consider sharing this link with a friend, following my GitHub or LinkedIn accounts, or subscribing to my RSS feed.