且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

搜索在另一个文本文件一个文本文件的行,更快

更新时间:2023-11-27 22:08:40

编辑:请注意,我假设这是合理至少读的有一个的文件到内存中。你可能想交换低于周围的查询,以避免加载大文件到内存中,但即使是86000行的(说)每行1K将是小于2G内存 - 这是比较少做一些显著。

Note that I'm assuming it's reasonable to read at least one file into memory. You may want to swap the queries below around to avoid loading the "big" file into memory, but even 86,000 lines at (say) 1K per line is going to be less than 2G of memory - which is relatively little to do something significant.

你每次阅读内部文件。有没有必要。加载这两个文件到内存中,并从那里走。哎呀,为的确切的匹配,你可以做整个事情在LINQ轻松:

You're reading the "inner" file each time. There's no need for that. Load both files into memory and go from there. Heck, for exact matches you can do the whole thing in LINQ easily:

var query = from line1 in File.ReadLines("newDataPath + "HolidayList1.txt")
            join line2 in File.ReadLines(dbFilePath + "newdbcontents.txt")
            on line1 equals line2
            select line1;

var commonLines = query.ToList();

但对于非联接它仍然是简单,只需读取一个文件首先完全(明确的),然后流其他:

But for non-joins it's still simple; just read one file completely first (explicitly) and then stream the other:

// Eagerly read the "inner" file
var lines2 = File.ReadAllLines(dbFilePath + "newdbcontents.txt");
var query = from line1 in File.ReadLines("newDataPath + "HolidayList1.txt")
            from line2 in lines2
            where line2.Contains(line1)
            select line1;

var commonLines = query.ToList();

有没有什么聪明在这里 - 这只是编写代码一个非常简单的方法来读取一个所有线路文件,然后遍历在其他文件中的行,并反对在第一个文件中的所有行的每一行检查。但是,即使没有什么巧,我的强烈的怀疑它会执行不够好为您服务。专注于简单,消除不必要的IO,看看这是否是足够好的尝试做任何事情票友了。

There's nothing clever here - it's just a really simple way of writing code to read all the lines in one file, then iterate over the lines in the other file and for each line check against all the lines in the first file. But even without anything clever, I strongly suspect it would perform well enough for you. Concentrate on simplicity, eliminate unnecessary IO, and see whether that's good enough before trying to do anything fancier.

请注意,在你的原代码,您应该使用使用您的的StreamReader 变量语句,以确保他们得到妥善处置。使用上面的代码变得非常简单,甚至不需要,虽然...

Note that in your original code, you should be using using statements for your StreamReader variables, to ensure they get disposed properly. Using the above code makes it simple to not even need that though...