Git/Large File Hunt: Difference between revisions

From Omnia
< Git
Jump to navigation Jump to search
(Created page with "== Kill Large Files == == Find Large Files == git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest * git rev-list --objects --all: Lists all objects reachable from any reference. * git cat-file --batch-check='%(objecttype) %(objectname) %(objec...")
 
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Kill Large Files ==
== Kill Large Files ==
Find large objects:
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
Find commit of large objects:
git log  --all --find-object <object-id>
Find branches containing said commit:
git branch -a --contains <commit-id>


== Find Large Files ==
== Find Large Files ==
Line 13: Line 22:


ref: [https://www.pixelite.co.nz/article/finding-and-deleting-large-files-in-a-git-repo/#:~:text=To%20find%20the%20largest%20files%20in%20a,fetching%20operations%2C%20and%20make%20developers%20less%20efficient.] [https://stackoverflow.com/questions/64397278/understanding-git-rev-list#:~:text=If%20you%20are%20using%20Git%20in%20the,all%20commits%20are%20reachable%20from%20all%20references.]
ref: [https://www.pixelite.co.nz/article/finding-and-deleting-large-files-in-a-git-repo/#:~:text=To%20find%20the%20largest%20files%20in%20a,fetching%20operations%2C%20and%20make%20developers%20less%20efficient.] [https://stackoverflow.com/questions/64397278/understanding-git-rev-list#:~:text=If%20you%20are%20using%20Git%20in%20the,all%20commits%20are%20reachable%20from%20all%20references.]
--- Example
Junk files that should not have been committed:
<pre>
...
b90d01c3dded  73MiB reserve-app/client/node_modules/.cache/default-development/9.pack
ee46b2ee4ad9  73MiB reserve-app/client/node_modules/.cache/default-development/4.pack
293ee8349dbd  74MiB reserve-app/client/node_modules/.cache/default-development/3.pack
02dbad83c8e6  74MiB reserve-app/client/node_modules/.cache/default-development/13.pack
60cbbaa4850f  100MiB mongo_local/logs/journal/TigerLog.0000000005
aa52a216f4fc  100MiB mongo_local/logs/journal/TigerPreplog.0000000001
</pre>
---
Show full ID:
git rev-list --all --objects | grep <blob-id>
git rev-list --all --objects | grep 60cbbaa4850f
# or just rerun without cut
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
== Find the commit that has these objects ==
git log --find-object=<blob-id> --all
Example:
git log  --all --find-object 60cbbaa4850f
== Find which branch has commit ==
git branch -a --contains <commit-id>
git reflog show --all | grep <commit-id>
== rewrite history removing object ==
== keywords ==

Latest revision as of 18:16, 3 July 2025

Kill Large Files

Find large objects:

git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

Find commit of large objects:

git log  --all --find-object <object-id>

Find branches containing said commit:

git branch -a --contains <commit-id>

Find Large Files

git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
  • git rev-list --objects --all: Lists all objects reachable from any reference.
  • git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)': Provides details about each object, including type, name, size, and the rest of the line.
  • sed -n 's/^blob //p': Filters the output to include only blob objects and removes the "blob " prefix.
  • sort --numeric-sort --key=2: Sorts the output numerically based on the second field (object size).
  • cut -c 1-12,41-: Extracts the first 12 characters (object ID) and everything from the 41st character onwards (file name).
  • $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest: Formats the file size to a human-readable format.

ref: [1] [2]

--- Example

Junk files that should not have been committed:

...
b90d01c3dded   73MiB reserve-app/client/node_modules/.cache/default-development/9.pack
ee46b2ee4ad9   73MiB reserve-app/client/node_modules/.cache/default-development/4.pack
293ee8349dbd   74MiB reserve-app/client/node_modules/.cache/default-development/3.pack
02dbad83c8e6   74MiB reserve-app/client/node_modules/.cache/default-development/13.pack
60cbbaa4850f  100MiB mongo_local/logs/journal/TigerLog.0000000005
aa52a216f4fc  100MiB mongo_local/logs/journal/TigerPreplog.0000000001

---


Show full ID:

git rev-list --all --objects | grep <blob-id>
git rev-list --all --objects | grep 60cbbaa4850f
# or just rerun without cut
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest


Find the commit that has these objects

git log --find-object=<blob-id> --all

Example:

git log  --all --find-object 60cbbaa4850f

Find which branch has commit

git branch -a --contains <commit-id>
git reflog show --all | grep <commit-id>

rewrite history removing object

keywords