且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

仅检索 XML 提要的一部分

更新时间:2023-01-11 15:53:51

当您处理大型 xml 文档并且不想像 DOM 解析器那样将整个文档加载到内存中时.您需要切换到 SAX 解析器.

When you are processing large xml documents and you don't want to load the whole thing in memory as DOM parsers do. You need to switch to a SAX parser.

SAX 解析器比 DOM 风格的解析器有一些优势.SAX 解析器只需要在每个解析事件发生时报告,并且通常一旦报告,几乎所有的信息都会被丢弃(它确实,然而,保留一些东西,例如所有元素的列表还没有被关闭,为了捕捉后面的错误,比如以错误的顺序结束标签).因此,一个所需的最小内存SAX 解析器与 XML 文件的最大深度成正比(即的 XML 树)和单个 XML 事件中涉及的最大数据(例如单个开始标签的名称和属性,或内容处理指令等).

SAX parsers have some benefits over DOM-style parsers. A SAX parser only needs to report each parsing event as it happens, and normally discards almost all of that information once reported (it does, however, keep some things, for example a list of all elements that have not been closed yet, in order to catch later errors such as end-tags in the wrong order). Thus, the minimum memory required for a SAX parser is proportional to the maximum depth of the XML file (i.e., of the XML tree) and the maximum data involved in a single XML event (such as the name and attributes of a single start-tag, or the content of a processing instruction, etc.).

对于 60 MB 的 XML 文档,与创建 DOM 的要求相比,这可能非常低.大多数基于 DOM 的系统实际上在低得多的级别上使用来构建树.

For a 60 MB XML document, this is likely to be very low compared to the requirments for creating a DOM. Most DOM based systems actually use at a much lower level to build up the tree.

为了创建利用 sax,子类 xml.sax.saxutils.XMLGenerator 和重写器 endElementstartElement字符.然后用它调用 xml.sax.parse .很抱歉,我手头没有详细的示例可以与您分享,但我相信您会在网上找到很多.

In order to create make use of sax, subclass xml.sax.saxutils.XMLGenerator and overrider endElement, startElement and characters. Then call xml.sax.parse with it. I am sorry I don't have a detailed example at hand to share with you, but I am sure you will find plenty online.