且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

最有效的方式来复制在Linux中的文件

更新时间:2023-11-25 15:41:22

不幸的是,你不能使用的sendfile(),因为目标不是一个套接字在这里。 (名字的sendfile()来自发送() +文件)。

Unfortunately, you cannot use sendfile() here because the destination is not a socket. (The name sendfile() comes from send() + "file").

有关零拷贝,你可以使用拼接()通过@戴夫的建议。 (除非它不会是零拷贝;这将是从源文件的网页缓存到目标文件的页面缓存一本)

For zero-copy, you can use splice() as suggested by @Dave. (Except it will not be zero-copy; it will be "one copy" from the source file's page cache to the destination file's page cache.)

不过......(一)拼接()是Linux的专用;和(b),你几乎可以肯定只是以及使用便携式接口,只要你正确地使用它们。

However... (a) splice() is Linux-specific; and (b) you can almost certainly do just as well using portable interfaces, provided you use them correctly.

在短,使用的open() + 阅读() + 的write() 的临时缓冲区。我建议8K。所以,你的code会是这个样子:

In short, use open() + read() + write() with a small temporary buffer. I suggest 8K. So your code would look something like this:

int in_fd = open("source", O_RDONLY);
assert(in_fd >= 0);
int out_fd = open("dest", O_WRONLY);
assert(out_fd >= 0);
char buf[8192];

while (1) {
    ssize_t result = read(in_fd, &buf[0], sizeof(buf));
    if (!result) break;
    assert(result > 0);
    assert(write(out_fd, &buf[0], result) == result);
}

通过这个循环中,你将会从in_fd页面缓存到CPU的L1高速缓存复制8K,然后从L1高速缓存写入out_fd页面缓存。那么你将覆盖L1高速缓存的一部分与来自该文件的下一个8K数据块,依此类推。最终的结果是,在 BUF 将永远不会真正(末尾也许除了一次)被存储在主内存中的所有数据;但从系统RAM的角度来看,这就像使用零拷贝拼接好()。再加上它是完全移植到任何POSIX系统。

With this loop, you will be copying 8K from the in_fd page cache into the CPU L1 cache, then writing it from the L1 cache into the out_fd page cache. Then you will overwrite that part of the L1 cache with the next 8K chunk from the file, and so on. The net result is that the data in buf will never actually be stored in main memory at all (except maybe once at the end); from the system RAM's point of view, this is just as good as using "zero-copy" splice(). Plus it is perfectly portable to any POSIX system.

请注意,小缓冲区在这里是关键。典型的现代CPU具有32K左右的L1数据缓存,因此如果缓冲区太大,这种方法会慢一些。可能很多,要慢得多。因此,保持缓冲在几千字节的范围。

Note that the small buffer is key here. Typical modern CPUs have 32K or so for the L1 data cache, so if you make the buffer too big, this approach will be slower. Possibly much, much slower. So keep the buffer in the "few kilobytes" range.

当然,除非你的磁盘子系统是非常非常快的,内存带宽可能不是你的限制因素。因此,我建议posix_fadvise$c$c>让内核知道你在忙什么:

Of course, unless your disk subsystem is very very fast, memory bandwidth is probably not your limiting factor. So I would recommend posix_fadvise to let the kernel know what you are up to:

posix_fadvise(in_fd, 0, 0, POSIX_FADV_SEQUENTIAL);

这会给出一个提示到Linux内核,它的预读机制应该是非常积极的。

This will give a hint to the Linux kernel that its read-ahead machinery should be very aggressive.

我也建议使用posix_fallocate$c$c>以preallocate为目标文件的存储。这会告诉你的时间提前,你是否会耗尽磁盘。而对于现代性的核心与现代文件系统(如XFS),这将有助于减少在目标文件碎片。

I would also suggest using posix_fallocate to preallocate the storage for the destination file. This will tell you ahead of time whether you will run out of disk. And for a modern kernel with a modern file system (like XFS), it will help to reduce fragmentation in the destination file.

我建议的最后一件事是 MMAP 。它通常是全归功于TLB抖动的最慢的方法。 (非常最近与透明大页面内核可能会减轻这种;我最近没有试过,但它肯定会导致非常糟糕,所以我只会打扰测试 MMAP 如果您有很多时间进行基准测试和一个非常最近的内核。)

The last thing I would recommend is mmap. It is usually the slowest approach of all thanks to TLB thrashing. (Very recent kernels with "transparent hugepages" might mitigate this; I have not tried recently. But it certainly used to be very bad. So I would only bother testing mmap if you have lots of time to benchmark and a very recent kernel.)

[更新]

有在评论是否一些问题从一个文件到另一个拼接是零拷贝。 Linux内核开发人员称之为页面窃取。无论是手册页拼接在内核源​​代码评论说, SPLICE_F_MOVE 标记应提供此功能。

There is some question in the comments about whether splice from one file to another is zero-copy. The Linux kernel developers call this "page stealing". Both the man page for splice and the comments in the kernel source say that the SPLICE_F_MOVE flag should provide this functionality.

不幸的是, SPLICE_F_MOVE 的支持是的在2.6.21抽出(早在2007年)永不更换。 (内核源的意见从来没有得到更新。)如果你搜索内核源代码,你会发现 SPLICE_F_MOVE 实际上没有任何地方提及。该最后一条消息,我可以找到(2008年)指出它正在等待更换。

Unfortunately, the support for SPLICE_F_MOVE was yanked in 2.6.21 (back in 2007) and never replaced. (The comments in the kernel sources never got updated.) If you search the kernel sources, you will find SPLICE_F_MOVE is not actually referenced anywhere. The last message I can find (from 2008) says it is "waiting for a replacement".

的底线是,拼接从一个文件到另一台电话的memcpy 来移动数据;它的的零拷贝。这不是明显优于你可以在用户空间用做 / 小缓冲区,所以你还不如粘到标准,便携式接口

The bottom line is that splice from one file to another calls memcpy to move the data; it is not zero-copy. This is not much better than you can do in userspace using read/write with small buffers, so you might as well stick to the standard, portable interfaces.

如果页面窃取是不断重新添加到Linux内核,那么的好处拼接会大得多。 (即使在今天,当目的是插座,你会得到真正的零拷贝,使得拼接更具吸引力。)但是,对于这个问题的目的,拼接不买你了。

If "page stealing" is ever added back into the Linux kernel, then the benefits of splice would be much greater. (And even today, when the destination is a socket, you get true zero-copy, making splice more attractive.) But for the purpose of this question, splice does not buy you very much.