如何在Python中处理无法破解的文件名?

更新时间：2022-10-14 18:10:25

如果您愿意切换到Python 3.1或更高版本，Python确实可以解决该问题:

I'd really like to have my Python application deal exclusively with Unicode strings internally. This has been going well for me lately, but I've run into an issue with handling paths. The POSIX API for filesystems isn't Unicode, so it's possible (and actually somewhat common) for files to have "undecodable" names: filenames that aren't encoded in the filesystem's stated encoding.

In Python, this manifests as a mixture of unicode and str objects being returned from os.listdir().

>>> os.listdir(u'/path/to/foo')
[u'bar', 'b\xe1z']

In that example, the character '\xe1' is encoded in Latin-1 or somesuch, even when the (hypothetical) filesystem reports sys.getfilesystemencoding() == 'UTF-8' (in UTF-8, that character would be the two bytes '\xc3\xa1'). For this reason, you'll get UnicodeErrors all over the place if you try to use, for example, os.path.join() with Unicode paths, because the filename can't be decoded.

The Python Unicode HOWTO offers this advice about unicode pathnames:

Note that in most occasions, the Unicode APIs should be used. The bytes APIs should only be used on systems where undecodable file names can be present, i.e. Unix systems.

Because I mainly care about Unix systems, does this mean I should restructure my program to deal only with bytestrings for paths? (If so, how can I maintain Windows compatibility?) Or are there other, better ways of dealing with undecodable filenames? Are they rare enough "in the wild" that I should just ask users to rename their damn files?

(If it is best to just deal with bytestrings internally, I have a followup question: How do I store bytestrings in SQLite for one column while keeping the rest of the data as friendly Unicode strings?)

Python does have a solution to the problem, if you're willing to switch to Python 3.1 or later:

PEP 383 - Non-decodable Bytes in System Character Interfaces.

上一篇 : ：在 Spark Streaming 中更改输出文件名下一篇 : 为什么MonkeyRunner.waitForConnection（）错误＆QUOT;亚洲开发银行ADB拒绝端口转发命令：无法绑定套接字＆QUOT;

如何在Python中处理无法破解的文件名?

相关阅读

技术问答最新文章