且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从 Git 历史记录中删除大文件

更新时间:2023-12-04 20:18:34

Git 中的提交历史只是提交.

Commit history in Git is nothing but commits.

任何提交都不能更改.所以对于任何从现有提交中删除一个大文件,那个东西——无论是BFG,还是git filter-branch,还是git filter-repo,或者其他什么——将不得不提取一个坏"提交,进行一些更改(例如,删除大文件),并进行新的和改进的替代提交.

No commit can ever be changed. So for anything to remove a big file from some existing commit, that thing—whether it's BFG, or git filter-branch, or git filter-repo, or whatever—is going to have to extract a "bad" commit, make some changes (e.g., remove the big file), and make a new and improved substitute commit.

最糟糕的是,每个后续提交都以不可更改的方式对错误提交的原始哈希 ID 进行编码.错误提交的直接子代将其编码为它们的父哈希.因此,您或工具必须将那些承诺复制到新的和改进的承诺中.它们的改进之处在于它们缺少大文件并且回溯到它们刚刚为初始错误提交所做的替换.

The terrible part of this is that each subsequent commit encodes, in an unchangeable way, the raw hash ID of the bad commit. The immediate children of the bad commit encode it as their parent hash. So you—or the tool—must copy those commits to new-and-improved ones. What's improved about them is that they lack the big file and refer back to the replacement they just made for the initial bad commit.

当然,他们的孩子将他们的哈希 ID 编码为父哈希 ID,因此现在该工具必须复制这些提交.这一直重复到每个分支中的最后提交,由分支名称标识:

Of course, their children encode their hash IDs as parent hash IDs, so now the tool must copy those commits. This repeats all the way up to the last commit in each branch, as identified by the branch name:

...--o--o--x--o--o--o   [old, bad version of branch]
         
          ●--●--●--●   <-- branch

其中 x 是错误的提交:x 必须被复制到第一个新的和改进的 但随后所有的提交也必须复制.

where x is the bad commit: x had to be copied to the first new-and-improved but then all subsequent commits had to be copied too.

作为不同提交的副本具有不同的哈希 ID.每个克隆现在必须放弃坏"提交——x一个及其所有后代——转而支持新的和改进的提交.

The copies, being different commits, have different hash IDs. Every clone must now abandon the "bad" commits—the x one and all its descendants—in favor of the new-and-improved ones.

所有这些存储库编辑工具都应努力进行最少的更改.BFG 可能是最快和最方便使用的,但是 git filter-branch 可以被告知 only 复制所有错误和后代提交并使用 --index-filter,这是它最快(仍然很慢!)的过滤器.为此,请使用:

All these repository-editing tools should strive to make minimal changes. The BFG is probably the fastest and most convenient to use, but git filter-branch can be told to copy only all bad-and-descendant commits and to use --index-filter, which is its fastest (still slow!) filter. To do this, use:

git filter-branch --index-filter <command> -- <hash>..branch1 <hash>..branch2 ...

其中 是合适的 "git rm --cached --ignore-unmatch" 命令(一定要引用整个内容)和 和分支名称指定要复制的提交.请记住,A..B 语法意味着不要查看提交 A 或更早的提交,而查看提交 B 和更早的 所以如果提交 x 是,比如说,deadbeefbadf00d...,你会想要使用它的 parent 的哈希作为限制器:

where the <command> is an appropriate "git rm --cached --ignore-unmatch" command (be sure to quote the whole thing) and the <hash> and branch names specify which commits to copy. Remember that A..B syntax means don't look at commit A or earlier, while looking at commits B and earlier so if commit x is, say, deadbeefbadf00d..., you'll want to use the hash of its parent as the limiter:

git filter-branch --index-filter "..." -- deadbeefbadf00d^..master

例如(用正确的删除命令填写 ... 部分).

for instance (fill in the ... part with the right removal command).

(注意:我并没有真正使用过 The BFG,但如果它不必要地重新复制提交,那真的很糟糕,我敢打赌它不会.)

(Note: I have not actually used The BFG, but if it re-copies commits unnecessarily, that's really bad, and I bet it does not.)