且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

我们如何确定文本文件中的行数?

更新时间:2023-02-20 11:01:03

你得到的估算算法的好处是非常快速:一个 stat(2)调用然后一些除法。无论文件有多大或多小,它都需要相同的时间和内存。但是在大量输入上也是非常错误的。

The benefit to the estimation algorithm you've got is that it is very fast: one stat(2) call and then some division. It'll take the same length of time and memory no matter how large or small the file is. But it's also vastly wrong on a huge number of inputs.

获得特定数字的***方法可能是实际读取整个文件,寻找'\ n'字符。如果你用大的二进制块读取文件(想想16384字节或更大的2的幂)并查找你感兴趣的特定字节,它可以接近磁盘IO带宽。

Probably the best way to get the specific number is to actually read through the entire file looking for '\n' characters. If you read the file in in large binary blocks (think 16384 bytes or a larger power of two) and look for the specific byte you're interested in, it can go at something approaching the disk IO bandwidth.