且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Python中解析两个JSON字符串

更新时间:2023-11-01 20:41:58

您可以使用标准的JSON解析器,并利用当正确的JSON字符串后面有额外的数据时引发的描述性异常.

You can use the standard JSON parser and make use of the descriptive exception it throws when there is extra data behind the proper JSON string.

当前(即我的JSON解析器版本)抛出ValueError并显示如下消息:"Extra data: line 3 column 1 - line 3 column 6 (char 5 - 10)".

Currently (that is, my version of the JSON parser) throws a ValueError with a message looking like this: "Extra data: line 3 column 1 - line 3 column 6 (char 5 - 10)".

在这种情况下的数字正则表达式轻松地从消息中解析出该数字)提供信息其中解析失败.因此,如果遇到该异常,则可以解析原始输入的子字符串,即解析直到该字符之前的所有内容,然后(我递归地提出)解析其余的内容.

The number 5 in this case (you can parse that out of the message easily with a regular expression) provides the information where the parsing failed. So if you get that exception, you can parse a substring of your original input, namely everything up to the character before that, and afterwards (I propose recursively) parse the rest.

import json, re

def jsonMultiParse(s):
  try:
    return json.loads(s)
  except ValueError as problem:
    m = re.match(
      r'Extra data: line \d+ column \d+ - line \d+ column \d+ .char (\d+) - \d+.',
      problem.message)
    if not m:
      raise
    extraStart = int(m.group(1))
    return json.loads(s[:extraStart]), jsonMultiParse(s[extraStart:])

print jsonMultiParse('{}[{}]    \n\n["foo", 3]')

将打印:

({}, ([{}], [u'foo', 3]))

如果您更喜欢使用直元组而不是嵌套元组:

In case you prefer to get a straight tuple instead of a nested one:

    return (json.loads(s),)

    return (json.loads(s[:extraStart]),) + jsonMultiParse(s[extraStart:])

返回:

({}, [{}], [u'foo', 3])