更新时间:2023-02-02 22:46:11
您可以使用一个awk单行识别重复文件。
You can identify duplicate files using an awk one-liner.
让我们来创建一些文件,其中一些是重复的。
Let's create some files, of which some would be duplicates.
[jaypal~/Temp]$ cat a.txt
jaypal
[jaypal~/Temp]$ cat b.txt
singh
[jaypal~/Temp]$ cat c.txt
jaypal
[jaypal~/Temp]$ cat d.txt
ayaplj
从上面我们所示的输出知道文件的 A.TXT 和 c.txt 有确切的重复。文件 d.txt ,即使有我的名字重新排列,不能被归类为重复。
From the output shown above we know that files a.txt and c.txt are exact duplicates. File d.txt even though has my name re-arranged, cannot be categorized as duplicate.
我们将使用上的每个文件校验和
工具和捕获在一个单独的文件的输出。
We will use cksum
utility on each file and capture the output in a separate file.
[jaypal~/Temp]$ cksum a.txt b.txt c.txt d.txt > cksum.txt
[jaypal~/Temp]$ cat cksum.txt
3007025847 7 a.txt
1281385283 6 b.txt
3007025847 7 c.txt
750690976 7 d.txt
注意:我用上面的方法,因为当时只有4本演示文件。如果你有数百个文件,检查从此DUP的用一个简单的循环
。
Note: I used the above method since there were only 4 files for this demo. If you have hundreds of files to check dups from then use a simple for loop
.
[jaypal~/Temp]$ for i in ./*.txt; do cksum $i >> cksum1.txt; done
[jaypal~/Temp]$ cat cksum1.txt
3007025847 7 ./a.txt
1281385283 6 ./b.txt
3007025847 7 ./c.txt
750690976 7 ./d.txt
现在,我们有cksum.txt文件,我们可以利用这个与我们的 AWK
单行识别重复。
Now that we have the cksum.txt file we can use this with our awk
one-liner to identify duplicates.
[jaypal~/Temp]$ awk 'NR==FNR && a[$1]++ { b[$1]; next } $1 in b' cksum.txt cksum.txt
3007025847 7 a.txt
3007025847 7 c.txt
这将列出所有在你的目录超过100份文件。 请注意,删除这些文件中的任何一个,不能同时使用。 :)你可以随时管道输出到排序
,让他们为了。
This will list all the files that have more than 1 copies in your directory. Please note that delete any one of these files and not both. :) You can always pipe the output to sort
to get them in order.
另外,您也可以执行以下操作来获得,而不是让两个副本只是单一的重复文件。我不是太喜欢这个的原因是因为它并没有告诉我这是哪重复。
Alternatively, you can do the following to get just single duplicate file instead of getting both copies. The reason I am not too fond of this one is because it doesn't show me which duplicate it is of.
[jaypal~/Temp]$ awk '{ x[$1]++; if (x[$1]>1) print $0}' cksum.txt
3007025847 7 c.txt