It seems that one can never stop learning new things with git, no matter for how long you’ve been using it (in my case, I’m a proud git user since 2008), because today I added a new trick to my toolbox, that already proved to be quite useful: “grepping” files in a git repository, as you would do it with git grep, but using a commit-id to limit the search to a specific snapshot of your project.
In other words, I found that it’s possible to do things like, say, grep files to search for something in your repository considering how it was, say, some commits ago.
This is the “magical” command:
git grep <search-params> <tree-id>
This is what I get if I try to search for updateBackingStore() in my local clone of WebKit, as if my current branch was “50 commits older” than what it actually is:
$ git grep updateBackingStore HEAD~50 0ae236137d560da6ca889a826a8f3d023364a669:AccessibilityObject.cpp:void AccessibilityObject::updateBackingStore() 0ae236137d560da6ca889a826a8f3d023364a669:AccessibilityObject.h: void updateBackingStore(); 0ae236137d560da6ca889a826a8f3d023364a669:AccessibilityObject.h:inline void AccessibilityObject::updateBackingStore() { } 0ae236137d560da6ca889a826a8f3d023364a669:atk/WebKitAccessibleUtil.h: coreObject->updateBackingStore(); \ 0ae236137d560da6ca889a826a8f3d023364a669:atk/WebKitAccessibleUtil.h: coreObject->updateBackingStore(); \ 0ae236137d560da6ca889a826a8f3d023364a669:atk/WebKitAccessibleWrapperAtk.cpp: coreObject->updateBackingStore(); 0ae236137d560da6ca889a826a8f3d023364a669:ios/WebAccessibilityObjectWrapperIOS.mm: m_object->updateBackingStore(); 0ae236137d560da6ca889a826a8f3d023364a669:ios/WebAccessibilityObjectWrapperIOS.mm: m_object->updateBackingStore(); 0ae236137d560da6ca889a826a8f3d023364a669:mac/WebAccessibilityObjectWrapperBase.mm: // Calling updateBackingStore() can invalidate this element so self must be retained. 0ae236137d560da6ca889a826a8f3d023364a669:mac/WebAccessibilityObjectWrapperBase.mm: m_object->updateBackingStore();
I don’t know about you, but I find this quite useful for me to answer questions such as “Where was this function being used in commit X?”, and things like that.
Anyway, you might have noticed that I mentioned <tree-id> in the recipe instead of <commit-id>, yet I used HEAD~50 in the example, which is actually a commit-id. And still works.
And the short explanation, without trying to explain here all the different kind of data types that git keeps internally for every repository (mainly commits, trees and blobs), is that git is smart enough to find the right tree-id associated to a given commit-id by just considering the current path inside the repository and the tree-id associated to the top directory for a given commit.
But how to know that tree-id myself in case I want to? Easy, just pretty print the full information of the commit object you’re interested in, instead of only seeing the abbreviated version (what you usually see with git show or git log:
$ git cat-file -p HEAD~50 tree 0ae236137d560da6ca889a826a8f3d023364a669 parent bdb7a7949a29988da3fe50a65d6c694d5084d379 author [...]
See that tree thing in the first line? That’s the tree-id that git needs for grepping, which as you can see can be easily extracted from a particular commit. Actually, you could get easily the tree-id for any subdirectory from this point, by using the git ls-tree command:
$ git ls-tree 0ae236137d560da6ca889a826a8f3d023364a669 100644 blob 3fe2340c9614e893f0dfeb720f23773bbf1ea076 .dir-locals.el 100644 blob 741c4d53b5a0338cf36900a283e89408d0f9d457 .gitattributes 100644 blob f45a975762be9a429aa971c18da01b433c559553 .gitignore 100644 blob d571aa28ea86c14c7880533bf3ba68e9ef4b3c81 .qmake.conf 100644 blob 10f85055ae9f3823f0d20808599f644c18af7921 CMakeLists.txt 100644 blob 5eb66e7bcbc7543eb3a4dbf183a9043545776659 ChangeLog 100644 blob 7dbe9d2e0029bab47b8b2b22065a1032ecfe4434 ChangeLog-2012-05-22 040000 tree d42a0b3121ed7993cfd250426d20472769760f87 Examples 100644 blob 78d89e5c70ad56c38b0c25e7705d42fa380c4ee0 GNUmakefile.am 040000 tree 4a9e87fc1f35efa1349a18b1df694530c981c57e LayoutTests 100644 blob 14e33157011157797dac62c494bac0bf254d7c2f Makefile 100644 blob ee723d830dea51d1ce9e2d1ad8c985eeca2d4f3f Makefile.shared 040000 tree 20c763d6a4e8749ad9e041e8372e9f47dc722f45 ManualTests 040000 tree 660d88b926cf618ab9e1612b8e2a3e97b15dbcbe PerformanceTests 040000 tree fbf9703d3e9a9e4cf2ff10817c99ba3a5de87410 Source 040000 tree 346110c441a674334f5f56ef42b9dd40def89c76 Tools 040000 tree 262cb11d9b491be35daee570f9b825bce5715579 WebKit.xcworkspace 040000 tree b9e48a7a24b4973b253ee14053808b40d67c94aa WebKitLibraries 040000 tree adce37b690957abdd21d2dd8ff77302c5a5a9071 Websites 100755 blob befd429487fc5ac9bb3494800f4eeaef1e607663 autogen.sh
And of course, “navigating” with more calls to git ls-tree you could also get the tree-id for a specific subdirectory, in case you wanted to constraint the search to that specific path of your repo.
However, considering that git is so good at translating a commit-id into a tree-id, my personal recommendation is that, instead, you first cd into the path you want to focus the search in, and then let git do its “magic” by just using the git grep <search-params> <tree-id> command.
So that’s it. Hope you find this useful, and please do not hesitate to share any comment or suggestion you might have with regard to this or any other “git trick” you might know.
I honestly love using git so much that sometimes I wonder if coding is not just a poor excuse to use git. Probably not, but the thing is that I can not imagine my life without it anymore. That’s a fact.