且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从Git历史记录中删除大文件

更新时间:2023-12-04 20:27:22

在Git中提交历史只是提交.

Commit history in Git is nothing but commits.

任何提交都无法更改.因此,为了让任何内容从某个现有提交中删除大文件,无论是BFG还是 git filter-branch git filter-repo 或其他方式-将必须提取错误的"提交,进行一些更改(例如,删除大文件),并进行新的改进的替代提交.

No commit can ever be changed. So for anything to remove a big file from some existing commit, that thing—whether it's BFG, or git filter-branch, or git filter-repo, or whatever—is going to have to extract a "bad" commit, make some changes (e.g., remove the big file), and make a new and improved substitute commit.

最糟糕的是,每个后续提交都以不可更改的方式对 bad 提交的原始哈希ID进行编码.错误提交的直接子代将其编码为其父哈希.因此,您(或工具)必须将那些提交复制到新的和改进的文件中.它们的改进之处在于,它们缺少大文件,并且可以参考他们刚刚为最初的错误提交所做的替换.

The terrible part of this is that each subsequent commit encodes, in an unchangeable way, the raw hash ID of the bad commit. The immediate children of the bad commit encode it as their parent hash. So you—or the tool—must copy those commits to new-and-improved ones. What's improved about them is that they lack the big file and refer back to the replacement they just made for the initial bad commit.

当然,他们的孩子将哈希ID编码为父哈希ID,因此现在该工具必须复制这些提交.这一直重复到每个分支中的 last 提交为止,由分支名称标识:

Of course, their children encode their hash IDs as parent hash IDs, so now the tool must copy those commits. This repeats all the way up to the last commit in each branch, as identified by the branch name:

...--o--o--x--o--o--o   [old, bad version of branch]
         \
          ●--●--●--●   <-- branch

其中 x 是错误的提交: x 必须复制到第一个新的和改进的上,但是随后的所有后续提交也必须被复制.

where x is the bad commit: x had to be copied to the first new-and-improved but then all subsequent commits had to be copied too.

作为不同提交的副本具有不同的哈希ID.每个克隆现在都必须放弃错误"的提交( x 一个及其所有后代),而应使用新的和改进的提交

The copies, being different commits, have different hash IDs. Every clone must now abandon the "bad" commits—the x one and all its descendants—in favor of the new-and-improved ones.

所有这些存储库编辑工具都应努力进行最小的更改.BFG可能是最快,最方便使用的方法,但是可以告诉 git filter-branch 仅复制 所有坏消息并使用--index-filter ,它是最快(仍然很慢!)的过滤器.为此,请使用:

All these repository-editing tools should strive to make minimal changes. The BFG is probably the fastest and most convenient to use, but git filter-branch can be told to copy only all bad-and-descendant commits and to use --index-filter, which is its fastest (still slow!) filter. To do this, use:

git filter-branch --index-filter <command> -- <hash>..branch1 <hash>..branch2 ...

其中< command> 是适当的"git rm --cached --ignore-unmatch" 命令(一定要引用整个内容),并且< hash> 和分支名称指定要复制的提交.请记住, A..B 语法意味着不要查看提交 A 或更早的版本,而要查看提交 B 或更早的版本,因此,如果提交 x deadbeefbadf00d ... ,则需要使用其 parent 的哈希作为限制器:

where the <command> is an appropriate "git rm --cached --ignore-unmatch" command (be sure to quote the whole thing) and the <hash> and branch names specify which commits to copy. Remember that A..B syntax means don't look at commit A or earlier, while looking at commits B and earlier so if commit x is, say, deadbeefbadf00d..., you'll want to use the hash of its parent as the limiter:

git filter-branch --index-filter "..." -- deadbeefbadf00d^..master

例如

(使用正确的删除命令填充 ... 部分).

(注意:我实际上没有使用过BFG,但是如果它不必要地重新复制了提交,那真的很糟糕,我敢打赌不会.)

(Note: I have not actually used The BFG, but if it re-copies commits unnecessarily, that's really bad, and I bet it does not.)