且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从 JSON 中获取字符串对象而不是 Unicode?

更新时间:2023-11-01 14:09:52

使用object_hook的解决方案

:针对 Python 2.7 3.x 兼容性进行了更新.

A solution with object_hook

[edit]: Updated for Python 2.7 and 3.x compatibility.

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    if isinstance(data, str):
        return data

    # if this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # if this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.items() # changed to .items() for python 2.7/3
        }

    # python 3 compatible duck-typing
    # if this is a unicode string, return its string representation
    if str(type(data)) == "<type 'unicode'>":
        return data.encode('utf-8')

    # if it's anything else, return it in its original form
    return data

示例用法:

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

这是如何工作的,我为什么要使用它?

Mark Amery 的函数比这些更短更清晰,那么它们有什么意义呢?为什么要使用它们?

How does this work and why would I use it?

Mark Amery's function is shorter and clearer than these ones, so what's the point of them? Why would you want to use them?

纯粹是为了性能.Mark 的答案首先使用 unicode 字符串完全解码 JSON 文本,然后递归整个解码值以将所有字符串转换为字节字符串.这有几个不良影响:

Purely for performance. Mark's answer decodes the JSON text fully first with unicode strings, then recurses through the entire decoded value to convert all strings to byte strings. This has a couple of undesirable effects:

  • 在内存中创建整个解码结构的副本
  • 如果您的 JSON 对象真的嵌套很深(500 层或更多),那么您将达到 Python 的最大递归深度
  • A copy of the entire decoded structure gets created in memory
  • If your JSON object is really deeply nested (500 levels or more) then you'll hit Python's maximum recursion depth

此答案通过使用 json.loadjson.loadsobject_hook 参数来缓解这两个性能问题.来自文档:

This answer mitigates both of those performance issues by using the object_hook parameter of json.load and json.loads. From the docs:

object_hook 是一个可选函数,将调用任何对象文字解码的结果(dict).将使用 object_hook 的返回值而不是 dict.此功能可用于实现自定义解码器

object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders

由于嵌套在其他字典深处的许多级别的字典在被解码时被传递给 object_hook ,因此我们可以在此时将其中的任何字符串或列表字节化并避免以后需要深度递归.

Since dictionaries nested many levels deep in other dictionaries get passed to object_hook as they're decoded, we can byteify any strings or lists inside them at that point and avoid the need for deep recursion later.

Mark 的答案不适合用作 object_hook,因为它会递归到嵌套字典中.我们使用 _byteifyignore_dicts 参数来防止此答案中的递归,当 object_hook 将一个新的 dict 传递给它以进行字节化.ignore_dicts 标志告诉 _byteify 忽略 dict,因为它们已经被字节化了.

Mark's answer isn't suitable for use as an object_hook as it stands, because it recurses into nested dictionaries. We prevent that recursion in this answer with the ignore_dicts parameter to _byteify, which gets passed to it at all times except when object_hook passes it a new dict to byteify. The ignore_dicts flag tells _byteify to ignore dicts since they already been byteified.

最后,我们的 json_load_byteifiedjson_loads_byteified 实现在结果上调用 _byteify(使用 ignore_dicts=True)从 json.loadjson.loads 返回以处理被解码的 JSON 文本在顶层没有 dict 的情况.

Finally, our implementations of json_load_byteified and json_loads_byteified call _byteify (with ignore_dicts=True) on the result returned from json.load or json.loads to handle the case where the JSON text being decoded doesn't have a dict at the top level.