更新时间:2023-11-01 14:09:52
object_hook
的解决方案:针对 Python 2.7 和 3.x 兼容性进行了更新.
object_hook
[edit]: Updated for Python 2.7 and 3.x compatibility.
import json
def json_load_byteified(file_handle):
return _byteify(
json.load(file_handle, object_hook=_byteify),
ignore_dicts=True
)
def json_loads_byteified(json_text):
return _byteify(
json.loads(json_text, object_hook=_byteify),
ignore_dicts=True
)
def _byteify(data, ignore_dicts = False):
if isinstance(data, str):
return data
# if this is a list of values, return list of byteified values
if isinstance(data, list):
return [ _byteify(item, ignore_dicts=True) for item in data ]
# if this is a dictionary, return dictionary of byteified keys and values
# but only if we haven't already byteified it
if isinstance(data, dict) and not ignore_dicts:
return {
_byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
for key, value in data.items() # changed to .items() for python 2.7/3
}
# python 3 compatible duck-typing
# if this is a unicode string, return its string representation
if str(type(data)) == "<type 'unicode'>":
return data.encode('utf-8')
# if it's anything else, return it in its original form
return data
示例用法:
>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}
Mark Amery 的函数比这些更短更清晰,那么它们有什么意义呢?为什么要使用它们?
Mark Amery's function is shorter and clearer than these ones, so what's the point of them? Why would you want to use them?
纯粹是为了性能.Mark 的答案首先使用 unicode 字符串完全解码 JSON 文本,然后递归整个解码值以将所有字符串转换为字节字符串.这有几个不良影响:
Purely for performance. Mark's answer decodes the JSON text fully first with unicode strings, then recurses through the entire decoded value to convert all strings to byte strings. This has a couple of undesirable effects:
此答案通过使用 json.load
和 json.loads
的 object_hook
参数来缓解这两个性能问题.来自文档:
This answer mitigates both of those performance issues by using the object_hook
parameter of json.load
and json.loads
. From the docs:
object_hook
是一个可选函数,将调用任何对象文字解码的结果(dict
).将使用 object_hook 的返回值而不是 dict
.此功能可用于实现自定义解码器
object_hook
is an optional function that will be called with the result of any object literal decoded (adict
). The return value of object_hook will be used instead of thedict
. This feature can be used to implement custom decoders
由于嵌套在其他字典深处的许多级别的字典在被解码时被传递给 object_hook
,因此我们可以在此时将其中的任何字符串或列表字节化并避免以后需要深度递归.
Since dictionaries nested many levels deep in other dictionaries get passed to object_hook
as they're decoded, we can byteify any strings or lists inside them at that point and avoid the need for deep recursion later.
Mark 的答案不适合用作 object_hook
,因为它会递归到嵌套字典中.我们使用 _byteify
的 ignore_dicts
参数来防止此答案中的递归,当 object_hook
将一个新的 dict
传递给它以进行字节化.ignore_dicts
标志告诉 _byteify
忽略 dict
,因为它们已经被字节化了.
Mark's answer isn't suitable for use as an object_hook
as it stands, because it recurses into nested dictionaries. We prevent that recursion in this answer with the ignore_dicts
parameter to _byteify
, which gets passed to it at all times except when object_hook
passes it a new dict
to byteify. The ignore_dicts
flag tells _byteify
to ignore dict
s since they already been byteified.
最后,我们的 json_load_byteified
和 json_loads_byteified
实现在结果上调用 _byteify
(使用 ignore_dicts=True
)从 json.load
或 json.loads
返回以处理被解码的 JSON 文本在顶层没有 dict
的情况.
Finally, our implementations of json_load_byteified
and json_loads_byteified
call _byteify
(with ignore_dicts=True
) on the result returned from json.load
or json.loads
to handle the case where the JSON text being decoded doesn't have a dict
at the top level.