且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用 Python 将文件格式从 Unicode 转换为 ASCII?

更新时间:2023-01-18 13:59:38

您只需使用 unicode 函数就可以很容易地转换文件,但是如果没有直接ASCII 等价物.

本博客推荐unicodedata 模块,这个模块好像负责粗略转换字符而不直接对应的 ASCII 值,例如

>>>title = u"Klüft skräms inför på fédéral électoral große"

通常转换为

Klft skrms infr p fdral lectoral groe

这是非常错误的.但是,使用 unicodedata 模块,结果可以更接近原文:

>>>导入 unicodedata>>>unicodedata.normalize('NFKD', title).encode('ascii','ignore')'Kluft skrams infor pa 联邦选举groe'

I use a 3rd party tool that outputs a file in Unicode format. However, I prefer it to be in ASCII. The tool does not have settings to change the file format.

What is the best way to convert the entire file format using Python?

You can convert the file easily enough just using the unicode function, but you'll run into problems with Unicode characters without a straight ASCII equivalent.

This blog recommends the unicodedata module, which seems to take care of roughly converting characters without direct corresponding ASCII values, e.g.

>>> title = u"Klüft skräms inför på fédéral électoral große"

is typically converted to

Klft skrms infr p fdral lectoral groe

which is pretty wrong. However, using the unicodedata module, the result can be much closer to the original text:

>>> import unicodedata
>>> unicodedata.normalize('NFKD', title).encode('ascii','ignore')
'Kluft skrams infor pa federal electoral groe'