更新时间:2023-01-17 16:05:18
这将从字符串中解码您的 JSON 对象列表":
from json import JSONDecoderdef load_invalid_obj_list(s):解码器 = JSONDecoder()s_len = len(s)对象 = []结束 = 0而结束!= s_len:obj, end =decoder.raw_decode(s, idx=end)objs.append(obj)返回对象
这里的好处是你可以很好地使用解析器.因此,它会不断告诉您确切地发现错误的位置.
示例
>>>load_invalid_obj_list('{}{}')[{}、{}]>>>load_invalid_obj_list('{}{ }{')回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件decode.py",第 9 行,在loads_invalid_obj_list 中obj, end =decoder.raw_decode(s, idx=end)文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py",第 376 行,raw_decodeobj, end = self.scan_once(s, idx)ValueError:预期对象:第 2 行第 2 列(字符 5)导入json进口重新#shameless 从 json/decoder.py 复制粘贴标志 = re.VERBOSE |re.MULTILINE |重新打点空格 = re.compile(r'[
]*', FLAGS)类 ConcatJSONDecoder(json.JSONDecoder):def 解码(self, s, _w=WHITESPACE.match):s_len = len(s)对象 = []结束 = 0而结束!= s_len:obj, end = self.raw_decode(s, idx=_w(s, end).end())end = _w(s, end).end()objs.append(obj)返回对象
示例
>>>打印 json.loads('{}', cls=ConcatJSONDecoder)[{}]>>>打印 json.load(open('file'), cls=ConcatJSONDecoder)[{}]>>>打印 json.loads('{}{} {', cls=ConcatJSONDecoder)回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py",第 339 行,加载中返回 cls(encoding=encoding, **kw).decode(s)文件decode.py",第 15 行,在解码中obj, end = self.raw_decode(s, idx=_w(s, end).end())文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py",第 376 行,raw_decodeobj, end = self.scan_once(s, idx)ValueError:预期对象:第 1 行第 5 列(字符 5)I have thousands of text files containing multiple JSON objects, but unfortunately there is no delimiter between the objects. Objects are stored as dictionaries and some of their fields are themselves objects. Each object might have a variable number of nested objects. Concretely, an object might look like this:
{field1: {}, field2: "some value", field3: {}, ...}
and hundreds of such objects are concatenated without a delimiter in a text file. This means that I can neither use json.load()
nor json.loads()
.
Any suggestion on how I can solve this problem. Is there a known parser to do this?
This decodes your "list" of JSON Objects from a string:
from json import JSONDecoder
def loads_invalid_obj_list(s):
decoder = JSONDecoder()
s_len = len(s)
objs = []
end = 0
while end != s_len:
obj, end = decoder.raw_decode(s, idx=end)
objs.append(obj)
return objs
The bonus here is that you play nice with the parser. Hence it keeps telling you exactly where it found an error.
Examples
>>> loads_invalid_obj_list('{}{}')
[{}, {}]
>>> loads_invalid_obj_list('{}{
}{')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "decode.py", line 9, in loads_invalid_obj_list
obj, end = decoder.raw_decode(s, idx=end)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 376, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 2 column 2 (char 5)
import json
import re
#shameless copy paste from json/decoder.py
FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
WHITESPACE = re.compile(r'[
]*', FLAGS)
class ConcatJSONDecoder(json.JSONDecoder):
def decode(self, s, _w=WHITESPACE.match):
s_len = len(s)
objs = []
end = 0
while end != s_len:
obj, end = self.raw_decode(s, idx=_w(s, end).end())
end = _w(s, end).end()
objs.append(obj)
return objs
Examples
>>> print json.loads('{}', cls=ConcatJSONDecoder)
[{}]
>>> print json.load(open('file'), cls=ConcatJSONDecoder)
[{}]
>>> print json.loads('{}{} {', cls=ConcatJSONDecoder)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return cls(encoding=encoding, **kw).decode(s)
File "decode.py", line 15, in decode
obj, end = self.raw_decode(s, idx=_w(s, end).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 376, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 5 (char 5)