且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用Nokogiri :: XML :: Reader解析大型XML文件?

更新时间:2023-09-27 17:23:58

流中的每个元素都通过两个事件来处理:一个打开元素,另一个关闭元素.开幕活动将有

Each element in the stream comes through as two events: one to open the element and one to close it. The opening event will have

node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT

关闭事件将有

node.node_type == Nokogiri::XML::Reader::TYPE_END_ELEMENT

您看到的空字符串仅仅是元素关闭事件.请记住,通过SAX解析,您基本上是在走一棵树,因此您需要第二个事件来告诉您何时返回和关闭元素.

The empty strings you're seeing are just the element closing events. Remember that with SAX parsing, you're basically walking through a tree so you need the second event to tell you when you're going back up and closing an element.

您可能想要更多类似这样的东西:

You probably want something more like this:

reader.each do |node|
  if node.name == "PMID" && node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
    p << node.inner_xml
  end
end

或者也许:

reader.each do |node|
  next if node.name      != 'PMID'
  next if node.node_type != Nokogiri::XML::Reader::TYPE_ELEMENT
  p << node.inner_xml
end

或其他一些变化.