使用ElementTree从混合元素xml标签获取文本

更新时间：2023-11-07 14:00:52

丢失的文本位¿？和我不喝酒。可以作为每个< vocal> 元素的 tail 属性（文本

The lost text bits, "¿Sí?" and "A mí no me suena.", are available as the tail property of each <vocal> element (the text following the element's end tag).

这是获取所需输出的一种方法（使用Python 2.7测试）。

Here is a way to get the wanted output (tested with Python 2.7).

假设vocal.xml看起来像这样：

Assume that vocal.xml looks like this:

<root>
  <u>
    <vocal type="filler">
      <desc>eh</desc>
    </vocal>¿Sí? 
  </u>

  <u>Pues... 
     <vocal type="non-ling">
       <desc>laugh</desc>
     </vocal>A mí no me suena. 
  </u>
</root>

代码：

from xml.etree import ElementTree as ET

root = ET.parse("vocal.xml") 

for u in root.findall(".//u"):
    v = u.find("vocal")

    if v.get("type") == "filler":
        frags = [u.text, v.findtext("desc"), v.tail]
    else:
        frags = [u.text, v.tail]

    print " ".join(t.encode("utf-8").strip() for t in frags).strip()

输出：

eh ¿Sí?
Pues... A mí no me suena.

上一篇 : ：如何在 Crystal Reports 中将 DateTime 对象转换为字符串下一篇 : 如何在 C# 中将字符串转换为字节 []

使用ElementTree从混合元素xml标签获取文本

相关阅读

推荐文章