Git/Large File Hunt: Difference between revisions

From Omnia
< Git
Jump to navigation Jump to search
(Created page with "== Kill Large Files == == Find Large Files == git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest * git rev-list --objects --all: Lists all objects reachable from any reference. * git cat-file --batch-check='%(objecttype) %(objectname) %(objec...")
 
No edit summary
Line 13: Line 13:


ref: [https://www.pixelite.co.nz/article/finding-and-deleting-large-files-in-a-git-repo/#:~:text=To%20find%20the%20largest%20files%20in%20a,fetching%20operations%2C%20and%20make%20developers%20less%20efficient.] [https://stackoverflow.com/questions/64397278/understanding-git-rev-list#:~:text=If%20you%20are%20using%20Git%20in%20the,all%20commits%20are%20reachable%20from%20all%20references.]
ref: [https://www.pixelite.co.nz/article/finding-and-deleting-large-files-in-a-git-repo/#:~:text=To%20find%20the%20largest%20files%20in%20a,fetching%20operations%2C%20and%20make%20developers%20less%20efficient.] [https://stackoverflow.com/questions/64397278/understanding-git-rev-list#:~:text=If%20you%20are%20using%20Git%20in%20the,all%20commits%20are%20reachable%20from%20all%20references.]
--- Example
Junk files that should not have been committed:
<pre>
...
b90d01c3dded  73MiB reserve-app/client/node_modules/.cache/default-development/9.pack
ee46b2ee4ad9  73MiB reserve-app/client/node_modules/.cache/default-development/4.pack
293ee8349dbd  74MiB reserve-app/client/node_modules/.cache/default-development/3.pack
02dbad83c8e6  74MiB reserve-app/client/node_modules/.cache/default-development/13.pack
60cbbaa4850f  100MiB mongo_local/logs/journal/TigerLog.0000000005
aa52a216f4fc  100MiB mongo_local/logs/journal/TigerPreplog.0000000001
</pre>
---
Show full ID:
git rev-list --all --objects | grep <blob-id>
git rev-list --all --objects | grep 60cbbaa4850f
# or just rerun without cut
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
=== Find the commit that has these objects ===
git log --find-object=<blob-id> --all
Example:
git log --find-object 60cbbaa4850f --all

Revision as of 03:31, 29 June 2025

Kill Large Files

Find Large Files

git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
  • git rev-list --objects --all: Lists all objects reachable from any reference.
  • git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)': Provides details about each object, including type, name, size, and the rest of the line.
  • sed -n 's/^blob //p': Filters the output to include only blob objects and removes the "blob " prefix.
  • sort --numeric-sort --key=2: Sorts the output numerically based on the second field (object size).
  • cut -c 1-12,41-: Extracts the first 12 characters (object ID) and everything from the 41st character onwards (file name).
  • $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest: Formats the file size to a human-readable format.

ref: [1] [2]

--- Example

Junk files that should not have been committed:

...
b90d01c3dded   73MiB reserve-app/client/node_modules/.cache/default-development/9.pack
ee46b2ee4ad9   73MiB reserve-app/client/node_modules/.cache/default-development/4.pack
293ee8349dbd   74MiB reserve-app/client/node_modules/.cache/default-development/3.pack
02dbad83c8e6   74MiB reserve-app/client/node_modules/.cache/default-development/13.pack
60cbbaa4850f  100MiB mongo_local/logs/journal/TigerLog.0000000005
aa52a216f4fc  100MiB mongo_local/logs/journal/TigerPreplog.0000000001

---


Show full ID:

git rev-list --all --objects | grep <blob-id>
git rev-list --all --objects | grep 60cbbaa4850f
# or just rerun without cut
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest


Find the commit that has these objects

git log --find-object=<blob-id> --all

Example:

git log --find-object 60cbbaa4850f --all