且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在python中解码从文件读取的unicode字符串?

更新时间:2023-11-15 16:31:10

看起来文件是通过向其写入字节文字而创建的,如下所示:

It looks like the file has been created by writing bytes literals to it, something like this:

some_bytes = b'Hello world'
with open('myfile.txt', 'w') as f:
    f.write(str(some_bytes))

这可以避免以下事实:尝试向以文本模式打开的文件写入字节会引发错误,但代价是该文件现在包含"b'hello world'" (注意引号内的"b".

This gets around the fact that attempting write bytes to a file opened in text mode raises an error, but at the cost that the file now contains "b'hello world'" (note the 'b' inside the quotes).

解决方案是在写入之前将 bytes 解码为 str :

The solution is to decode the bytes to str before writing:

some_bytes = b'Hello world'
my_str = some_bytes.decode('utf-16') # or whatever the encoding of the bytes might be
with open('myfile.txt', 'w') as f:
    f.write(my_str)

或以二进制模式打开文件并直接写入字节

or open the file in binary mode and write the bytes directly

some_bytes = b'Hello world'
with open('myfile.txt', 'wb') as f:
    f.write(some_bytes)

请注意,如果以文本模式打开文件,则需要提供正确的编码

Note you will need to provide the correct encoding if opening the file in text mode

with open('myfile.txt', encoding='utf-16') as f:  # Be sure to use the correct encoding

考虑将运行Python的 -b -bb 标志设置为分别发出警告或异常以检测对字节进行字符串化的尝试.

Consider running Python with the -b or -bb flag set to raise a warning or exception respectively to detect attempts to stringify bytes.