且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

读取大文件时抛出错误

更新时间:2022-11-30 15:12:44

为什么要把一切都变得如此复杂?

你创建一个新的字符串列表,您可以向其添加单个字符串:文件的整个文本内容,在内存映射后,并从映射文件创建两个流。

这相当于说:

Why are you making everything so complicated?
You create a new List of strings, to which you add a single string: the entire text content of the file, after memory mapping it and creating two streams form the mapped file.
Which is the equivalent of saying:
List<string> rowArray = new List<string>();
rowArray.Add(File.ReadAllText(filePath));

试试看,看看你的内存问题是否消失......



我需要逐行读取一行并在ListView中显示



这不是你的代码所做的,甚至没有轻微。

这绝对不是你想要做的,而且正是你为什么会出现内存不足的错误 - 永远不要试图在一个中显示那么多信息走。您希望包含100MB(或更差300MB)的文件的文本行数是多少?如果平均每行80个字符 - 而且通常会少得多 - 那么100MB文件的1,250,000行和较大文件的3,750,000行。您如何期望您的用户能够应对这一点,更别提系统了?这些行中的每一行都是ListView中的单独控件!这就是你内存耗尽的原因!



不要这样做。对它进行分页,过滤,搜索。但是只要向用户投掷那么多,并期望他们应对它的懒惰,只会导致你的用户返回软件并要求退款...

Try that, and see if your memory problem goes away...

I need to read one line by one line and show it in ListView

That isn't what your code is doing, not even slightly.
And that very definitely isn't what you want to do at all, and is exactly why you are getting "out of memory" errors - never try to display that much information in one go. How many lines of text do you expect a file of 100MB (or worse 300MB) to contain? If the average is 80 characters per line - and it'll normally be a lot less - that's 1,250,000 lines for the 100MB file, and 3,750,000 for the bigger file. How do you expect your user to cope with that, never mind the system? Each of those lines is a separate control in a ListView! And that's why you run out of memory!

Don't do it. Page it, filter it, search it. But just throwing that much at a user and expecting them to cope it lazy and will only ever result in your user returning the software and demanding his money back...


可以使用的总内存列表使用限制为2 GB。使用带有64位版本的.NET 4.5或更高版本时,可以使用&LT; gcAllowVeryLargeObjects&GT;元素 [ ^ ]。



您现在可能会问为什么要使用大约300 MB大小的文件来运行此限制。



你有一个字符串列表。字符串在内部使用UTF-16编码(每个字符两个字节)。如果输入文件仅包含ASCII字符,则存储字符串所需的内存是文件大小的两倍。



该列表还必须存储对所有字符串的引用。因此,每行有4个字节(或8个64位构建)。



列表会在阅读期间增长。这需要重新分配。重新分配后,将分配新内存,并在释放旧内存之前复制当前内容。在此类操作期间,您可能会遇到内存不足异常。这可以通过设置需要确定行数的列表容量来避免。这样可以避免为实际需要的行重新分配和分配内存。



您的问题的解决方案可能是使用包含内存映射文件的行偏移的列表。该列表每行仅使用4或8个字节。因为您正在使用虚拟列表视图,所以使用偏移量提取视图中的字符串(到下一个偏移的距离是行长度)。
The total memory that can be used by lists is limited to 2 GB. When using .NET 4.5 or later with 64-bit builds this limit can be increased with <gcAllowVeryLargeObjects> Element[^].

You may ask now why you run into this limit with files of about 300 MB size.

You have a string list. Strings are using UTF-16 encoding internally (two bytes per charater). If your input file contains only ASCII characters, the required memory to store the strings is twice the file size.

The list must also store the references to all strings. So you have 4 bytes (or 8 with 64-bit builds) per line.

The list will grow during reading. This requires re-allocation. Upon re-allocation, new memory is allocated, and the current content is copied before the old memory is released. During such operations you will probably get the out of memory exception. This can be avoided by setting the lists capacity which requires determination of the number of lines. This avoids re-allocation and allocating memory for more lines than actually required.

A solution for your problem might be using a list containing the line offsets for your memory mapped file. This list would only use 4 or 8 bytes per line. Because you are using a virtual list view, extract the strings in view by using the offsets (distance to next offset is line length).