如何在 Pig 中使用 Xpath 提取 xml 属性?

更新时间：2021-10-27 06:23:15

piggybank 的 XPath 类有 2 个错误:

There are 2 bugs in piggybank's XPath class:

ignoreNamepace 参数默认为true，无法覆盖https://issues.apache.org/jira/browse/PIG-4752

The ignoreNamepace parameter is defaulted to true and cannot be overwritten https://issues.apache.org/jira/browse/PIG-4752

这是我使用 XPathAll 的解决方法:

Here is my workaround using XPathAll:

XPathAll(x, 'BOOK/TITLE/@test', true, false).$0 as (test:chararray)

另外，如果您仍然需要忽略命名空间:

Also if you still need to ignore namespaces:

XPathAll(x, '//*[local-name()=\'BOOK\']//*[local-name()=\'TITLE\']/@test', true, false).$0 as (test:chararray)

相关阅读