且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

当字符串中有非ASCII字符时,如何将C字符串(字符数组)转换为Python字符串?

更新时间:2022-06-24 22:39:00

PyString_Decode: p>

PyString_Decode does this:

PyObject *PyString_Decode(const char *s,
              Py_ssize_t size,
              const char *encoding,
              const char *errors)
{
    PyObject *v, *str;

    str = PyString_FromStringAndSize(s, size);
    if (str == NULL)
    return NULL;
    v = PyString_AsDecodedString(str, encoding, errors);
    Py_DECREF(str);
    return v;
}

IOW,它基本上是你在第二个例子中做的 - 到字符串,然后解码字符串。这里的问题来自PyString_AsDecodedString,而不是PyString_AsDecodedObject。 PyString_AsDecodedString不使用PyString_AsDecodedObject,而是尝试将生成的unicode对象转换为带有默认编码的字符串对象(对于你来说,看起来像是ASCII)。

IOW, it does basically what you're doing in your second example - converts to a string, then decode the string. The problem here arises from PyString_AsDecodedString, rather than PyString_AsDecodedObject. PyString_AsDecodedString does PyString_AsDecodedObject, but then tries to convert the resulting unicode object into a string object with the default encoding (for you, looks like that's ASCII). That's where it fails.

我相信你需要做两个调用 - 但你可以使用PyString_AsDecodedObject而不是调用pythondecode方法。例如:

I believe you'll need to do two calls - but you can use PyString_AsDecodedObject rather than calling the python "decode" method. Something like:

#include <Python.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
     char c_string[] = { (char)0x93, 0 };
     PyObject *py_string, *py_unicode;

     Py_Initialize();

     py_string = PyString_FromStringAndSize(c_string, 1);
     if (!py_string) {
          PyErr_Print();
          return 1;
     }
     py_unicode = PyString_AsDecodedObject(py_string, "windows_1252", "replace");
     Py_DECREF(py_string);

     return 0;
}



我不完全确定PyString_Decode工作方式的原因是什么。 python-dev上的旧线程似乎表明它与链接输出有关,但由于Python方法不会做同样的事情,我不知道这是否仍然相关。

I'm not entirely sure what the reasoning behind PyString_Decode working this way is. A very old thread on python-dev seems to indicate that it has something to do with chaining the output, but since the Python methods don't do the same, I'm not sure if that's still relevant.