检测编码错误的UTF-8文本文件中的编码

更新时间：2023-02-20 10:56:40

最终，我知道了.使用CharsetNormalizerMatches似乎可以正常检测编码.无论如何，这就是我的实现方式，它就像一个超级按钮一样工作，可以正确地检测出相关文件的gb18030编码:

Eventually, I've figured it out. Using CharsetNormalizerMatches seems to work, properly detecting the encoding. Anyways, this is how I implemented it and it works like a charm, correctly detecting gb18030 encoding for the file in question:

from charset_normalizer import CharsetNormalizerMatches as CnM
encoding = CnM.from_path(path).best().first().encoding

注意:有人建议使用CharsetNormalizerMatches，但有人在此提示了我的答案，但后来在这里删除了他的帖子.太可惜了，我很想给他/她功劳.

Note: The answer was hinted to me by someone who suggested using CharsetNormalizerMatches, but later deleted his post here. Too bad, I'd love to give him/her the credit.

上一篇 : ：如何使用Excel以UTF-8编码打开文本文件?下一篇 : “无法找到类型或命名空间名称”OfficeOpenXml“错误

检测编码错误的UTF-8文本文件中的编码

相关阅读

技术问答最新文章