且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用文件签名的文件类型/扩展名

更新时间:2023-02-08 18:52:58

首先要了解的是:从文件系统的角度来看,没有扩展名(惊奇吗?).
旧系统使用了它们,但是很久以前.只有在Shell级别才能生存的扩展是文件类型"的特殊性,而文件类型"是扩展"和软件应用程序之间的关联.扩展名只是文件名的一部分(如Unix中的文件名).

顺便说一句,您还应该在搜索中支持Posix,并考虑硬链接和软链接(重新解析点).不考虑软链接可能会导致无限循环搜索(带有软链接的Posix文件系统不再是一棵树了,很奇怪!)

由于文件搜索与命令行管理程序无关,因此您永远不要使用扩展的概念.您应该使用通用口罩.例如,您应该能够使用掩码"my * .jp *".有关正确功能的良好示例,请参见Total Commander: http://en.wikipedia.org/wiki/Total_Commander [^ ].

不过,这是一个重要警告:请特别小心! System.IO.Directory.GetFiles并没有您所期望的那样工作!看到此问题的解释: Directory.Get.Files搜索模式问题 [ ^ ].

最后,您可以在软件中使用文件签名,但是没有一个通常可以接受的文件签名,不是它们被识别为文件系统.您只能单独存储签名或将其用于其他搜索条件.另外,请查看它在Total Commander中的工作方式(虽然没有按签名搜索).使用.NET,您可以使用可用的哈希函数( http://en.wikipedia.org/wiki/Cryptographic_hash_function [ ^ ])MD5(不建议用于安全性目的,请参见 http://en.wikipedia.org/wiki/MD5 [ http://en.wikipedia.org/wiki/SHA-2 [ ^ ]):请参见System.Security.Cryptography.MD5System.Security.Cryptography.SHA1System.Security.Cryptography.SHA256.


请记住,您不仅应使用"*"通配符,还应使用?".


关于后续问题:

如果要识别某种签名以检测文件类型,则不能使用它.文件系统被设计为将具有所有有效名称的所有文件视为相等.如果尝试根据名称或上下文中的部分对它们进行分类,则始终会出现假阳性和假阴性.

例如,存在可执行文件的签名:"MZ",或者很少有"ZM". Unicode文本文件可以包含BOM( http://unicode.org/faq/utf_bom.html [
另请参阅以下我的评论.

—SA
First thing to understand: from the file system''s stand point, there are no extensions (surprise?).
Old systems used them, but long time ago. The extensions only survive is the Shell level as a peculiarity of "file type" which is an association between "extension" and a software application. Extension here is just a part if file name (like in Unix).

By the way, you should also support Posix in your search and take into account hard links and soft links (re-parse points). Not taking soft links into account may lead to infinite circular search (Posix file system with soft links is not a tree anymore, surprise!)

As file search has nothing to do with the Shell, you should never use the notion of extension. You should use general-purpose masks. For example, you should be able to use mask "my*.jp*". For a good sample of right functionality, please see Total Commander: http://en.wikipedia.org/wiki/Total_Commander[^].

Here is a big warning though: be extra careful! System.IO.Directory.GetFiles does not work as you would expect! See this for explanation of the problem: Directory.Get.Files search pattern problem[^].

Finally, you can use file signatures in your software, but there is no one commonly acceptable file signatures, not they are recognized be the file system. You can only store signature separately or used them for additional search criteria. Also, see how it works in Total Commander (there is not search by signature though). With .NET you can use available hash functions (http://en.wikipedia.org/wiki/Cryptographic_hash_function[^]) MD5 (not recommended for security purpose, by the way, see http://en.wikipedia.org/wiki/MD5[^]) or SHA family (http://en.wikipedia.org/wiki/SHA-2[^]): see System.Security.Cryptography.MD5, System.Security.Cryptography.SHA1 and System.Security.Cryptography.SHA256.


Remember, you should use not just "*" wild card, also "?".


On the follow-up Question:

If you mean to recognize some signature to detect type of the file, it cannot be used. The file system is designed to treat all files with all valid names as equal. If you try to classify them based on name or part if its context, you will always get false positives and false negatives.

For example, there is a signature for executables: "MZ" or, rarely, "ZM". Unicode text file can contain BOM (http://unicode.org/faq/utf_bom.html[^]). There are also a variety of signatures for all those media containers, sound, image and video, already must less reliable.. None of that provided 100% reliable signature, by definition. You say, you want "a more accurate result from the search". An attempt to do any classification based on file content can only reduce your accuracy.

See also my comments below.

—SA


我在MSDN上看到了有关该主题的讨论:
检测文件类型 [ ^ ]

但是我不确定搜索工具中是否需要这样做.即使您以某种方式为您的工具编写了此功能,谁将使用它?我的意思是我将拥有此功能的强大能力吗?
抱歉,如果我不明白您的想法...
I saw the discussion about that on MSDN:
Detect file type[^]

But I''m not sure that this is necessary in search tools. Even if you somehow write this functionality for your tool, who will use it? I mean what is the great ability I will have with this functionality?
Sorry if I don''t understand you idea...