将xml文档转换为特定的点扩展json结构

更新时间：2023-11-14 07:59:28

您可以在此处使用递归。一种方法是在递归XML文档时逐步存储路径，并在最后返回结果字典，该字典可以序列化为JSON。

You can use recursion here. One way is to store the paths progressively as your recurse the XML document, and return a result dictionary at the end, which can be serialized to JSON.

以下演示使用标准库 xml.etree.ElementTree 用于解析XML文档。

The below demo uses the standard library xml.etree.ElementTree for parsing XML documents.

演示：

from xml.etree.ElementTree import ElementTree
from pprint import pprint

# Setup XML tree for parsing
tree = ElementTree()
tree.parse("sample.xml")
root = tree.getroot()

def collect_xml_paths(root, path=[], result={}):
    """Collect XML paths into a dictionary"""

    # First collect root items
    if not result:
        root_id, root_value = tuple(root.attrib.items())[0]
        root_key = root.tag + "[@%s]" % root_id
        result[root_key] = root_value

    # Go through each child from root
    for child in root:

        # Extract text
        text = child.text.strip()

        # Update path
        new_path = path[:]
        new_path.append(child.tag)

        # Create dot separated key
        key = ".".join(new_path)

        # Get child attributes
        attributes = child.attrib

        # Ensure we have attributes
        if attributes:

            # Add each attribute to result
            for k, v in attributes.items():
                attrib_key = key + "[@%s]" % k
                result.setdefault(attrib_key, []).append(v)

        # Add text if it exists
        if text:
            result.setdefault(key, []).append(text)

        # Recurse through paths once done iteration
        collect_xml_paths(child, new_path)

    # Separate single values from list values
    return {k: v[0] if len(v) == 1 else v for k, v in result.items()}

pprint(collect_xml_paths(root))

输出：

{'Genres.Genre': ['Comedy', 'TV-Show'],
 'Genres.Genre[@FacebookID]': ['6003161475030', '6003172932634'],
 'Item[@ID]': '288917',
 'Main.Platform': 'iTunes',
 'Main.PlatformID': '353736518',
 'Products.Product.Offers.Offer.Currency': ['CAD', 'CAD', 'EUR', 'EUR'],
 'Products.Product.Offers.Offer.Price': ['3.49', '2.49', '2.49', '1.99'],
 'Products.Product.Offers.Offer[@Type]': ['HDBUY', 'SDBUY', 'HDBUY', 'SDBUY'],
 'Products.Product.Rating': 'Tout public',
 'Products.Product.URL': ['https://itunes.apple.com/ca/tv-season/id353187108?i=353736518',
                      'https://itunes.apple.com/fr/tv-season/id353187108?i=353736518'],
 'Products.Product[@Country]': ['CA', 'FR']}

如果想要将此字典序列化为JSON，可以使用 json.dumps（） ：

If you want to serialize this dictionary to JSON, you can use json.dumps():

from json import dumps

print(dumps(collect_xml_paths(root)))
# {"Item[@ID]": "288917", "Main.Platform": "iTunes", "Main.PlatformID": "353736518", "Genres.Genre[@FacebookID]": ["6003161475030", "6003172932634"], "Genres.Genre": ["Comedy", "TV-Show"], "Products.Product[@Country]": ["CA", "FR"], "Products.Product.URL": ["https://itunes.apple.com/ca/tv-season/id353187108?i=353736518", "https://itunes.apple.com/fr/tv-season/id353187108?i=353736518"], "Products.Product.Offers.Offer[@Type]": ["HDBUY", "SDBUY", "HDBUY", "SDBUY"], "Products.Product.Offers.Offer.Price": ["3.49", "2.49", "2.49", "1.99"], "Products.Product.Offers.Offer.Currency": ["CAD", "CAD", "EUR", "EUR"], "Products.Product.Rating": "Tout public"}

上一篇 : ：为什么包括“使用命名空间"?在 C++ 中放入头文件是个坏主意?下一篇 : C＃线程 - 锁对象

将xml文档转换为特定的点扩展json结构

相关阅读

推荐文章