且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在Python 3中将字符串转换为unicode /字节字符串?

更新时间:2022-12-22 20:33:42

如果我正确理解,该文件将包含文字文本 \u65b9\u6cd5\uff0c\u5220\u9664\ 5u5b58\u50a8\u5728 (所以它是纯ASCII码,但带有反斜杠,并且所有描述Unicode序号的方式都与在Python str $ c中一样$ c>文字)。如果是这样,有两种方法可以解决此问题:

If I understand correctly, the file contains the literal text \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728 (so it's plain ASCII, but with backslashes and all that describe the Unicode ordinals the same way you would in a Python str literal). If so, there are two ways to handle this:


  1. 以二进制模式读取文件,然后调用 mystr = mybytes.decode('unicode-escape') bytes 转换为 str 解释转义

  2. 以文本模式读取文件,并使用 codecs 模块进行文本->文本转换(字节现在,仅 codecs 模块功能支持字节到文本和文本到文本的编解码器; bytes.decode 仅用于字节文本和 str.encode 纯粹是文本到字节,因为通常在Py2中, str.encode unicode.decode 是一个错误,删除危险的方法可以使您更容易理解转换的方向。 decodedstr = codecs.decode(encodedstr,'unicode-escape')

  1. Read the file in binary mode, then call mystr = mybytes.decode('unicode-escape') to convert from the bytes to str interpreting the escapes
  2. Read the file in text mode, and use the codecs module for the "text -> text" conversion (bytes to bytes and text to text codecs are now supported only by the codecs module functions; bytes.decode is purely for bytes to text and str.encode is purely for text to bytes, because usually, in Py2, str.encode and unicode.decode was a mistake, and removing the dangerous methods makes it easier to understand what direction the conversions are supposed to go), e.g. decodedstr = codecs.decode(encodedstr, 'unicode-escape')