且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

C#文件管理

更新时间:2023-02-05 17:10:06

下面是一个简单的解决方案,这正好同时读取文件和数据进行比较。它应该是不大于散列方法比较慢,因为这两种方法都将不得不读取整个文件。 修改:由其他人所指出的,这实际上实现比散列法略慢,因为它的简单。请参阅下面的更快的方法。

Here's a simple solution, which just reads both files and compares the data. It should be no slower than the hash method, since both methods will have to read the entire file. EDIT As noted by others, this implementation is actually somewhat slower than the hash method, because of its simplicity. See below for a faster method.

static bool FilesAreEqual( string f1, string f2 )
{
	// get file length and make sure lengths are identical
	long length = new FileInfo( f1 ).Length;
	if( length != new FileInfo( f2 ).Length )
		return false;

	// open both for reading
	using( FileStream stream1 = File.OpenRead( f1 ) )
	using( FileStream stream2 = File.OpenRead( f2 ) )
	{
		// compare content for equality
		int b1, b2;
		while( length-- > 0 )
		{
			b1 = stream1.ReadByte();
			b2 = stream2.ReadByte();
			if( b1 != b2 )
				return false;
		}
	}

	return true;
}

您可以修改它一次读取多个字节,但内部文件流应该已经缓冲数据,所以即使这个简单的代码应该是比较快的。

You could modify it to read more than one byte at a time, but the internal file stream should already be buffering the data, so even this simple code should be relatively fast.

修改感谢这里的速度反馈。我仍然认为比较,所有字节的方法可以是一样快的MD5方法,因为这两种方法都读取整个文件。我怀疑(但不知道肯定),一旦文件已被阅读,比较,所有字节的方法需要较少的实际计算。在任何情况下,我复制你的表现的观察我的初步实施,但是当我添加了一些简单的缓冲,比较-所有字节的方法是一样快。下面是缓冲实现,随意发表进一步的评论。

EDIT Thanks for the feedback on speed here. I still maintain that the compare-all-bytes method can be just as fast as the MD5 method, since both methods have to read the entire file. I would suspect (but don't know for sure) that once the files have been read, the compare-all-bytes method requires less actual computation. In any case, I duplicated your performance observations for my initial implementation, but when I added some simple buffering, the compare-all-bytes method was just as fast. Below is the buffering implementation, feel free to comment further!

修改乔恩·B产生换个好点:在该文件实际上是案件不同,这种方法能够尽快停止,因为它找到的第一个不同的字节,而散列方法来读取在任何情况下这两个文件的全部内容。

EDIT Jon B makes another good point: in the case where the files actually are different, this method can stop as soon as it finds the first different byte, whereas the hash method has to read the entirety of both files in every case.

static bool FilesAreEqualFaster( string f1, string f2 )
{
	// get file length and make sure lengths are identical
	long length = new FileInfo( f1 ).Length;
	if( length != new FileInfo( f2 ).Length )
		return false;

	byte[] buf1 = new byte[4096];
	byte[] buf2 = new byte[4096];

	// open both for reading
	using( FileStream stream1 = File.OpenRead( f1 ) )
	using( FileStream stream2 = File.OpenRead( f2 ) )
	{
		// compare content for equality
		int b1, b2;
		while( length > 0 )
		{
			// figure out how much to read
			int toRead = buf1.Length;
			if( toRead > length )
				toRead = (int)length;
			length -= toRead;

			// read a chunk from each and compare
			b1 = stream1.Read( buf1, 0, toRead );
			b2 = stream2.Read( buf2, 0, toRead );
			for( int i = 0; i < toRead; ++i )
				if( buf1[i] != buf2[i] )
					return false;
		}
	}

	return true;
}