更新时间:2023-01-18 13:59:38
您只需使用 unicode
函数就可以很容易地转换文件,但是如果没有直接ASCII 等价物.
本博客推荐unicodedata
模块,这个模块好像负责粗略转换字符而不直接对应的 ASCII 值,例如
通常转换为
Klft skrms infr p fdral lectoral groe
这是非常错误的.但是,使用 unicodedata
模块,结果可以更接近原文:
I use a 3rd party tool that outputs a file in Unicode format. However, I prefer it to be in ASCII. The tool does not have settings to change the file format.
What is the best way to convert the entire file format using Python?
You can convert the file easily enough just using the unicode
function, but you'll run into problems with Unicode characters without a straight ASCII equivalent.
This blog recommends the unicodedata
module, which seems to take care of roughly converting characters without direct corresponding ASCII values, e.g.
>>> title = u"Klüft skräms inför på fédéral électoral große"
is typically converted to
Klft skrms infr p fdral lectoral groe
which is pretty wrong. However, using the unicodedata
module, the result can be much closer to the original text:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', title).encode('ascii','ignore')
'Kluft skrams infor pa federal electoral groe'