更新时间:2023-12-05 22:32:40
XMLStarlet 是一个命令行XML处理器。做你想用它的是一个单行的操作(直到所需递归行为被加入),并且将用于描述相同的输入XML语法的所有变体工作:
简单的版本:
xmlstarlet版\\
-d//*[not(./*)和(非(./文())或正常化空间(./文())=)]'\\
input.xml中
花哨的版本:
strip_recursively(){
本地文档last_doc
IFS =读-r -d'DOC
而:;做
last_doc = $ doc的
DOC = $(xmlstarlet编辑\\
-d//*[not(./*)和(非(./文())或正常化空间(./文())=)]'\\
为/ dev /标准输入<<<$ last_doc)
如果[[$ doc的=$ last_doc]];然后
printf的'%s的\\ n'$ doc的
返回
科幻
DONE
}
strip_recursively<的input.xml
的/ dev /标准输入
而不使用 -
(在一定的成本,以平台的可移植性)为更好的便携性跨越XMLStarlet的排放;调整的味道。
在仅安装较旧的依赖关系有一个系统,已经安装了一个更可能的XML解析器是捆绑使用Python。
#!的/ usr /斌/包膜蟒蛇进口位置为xml.etree.ElementTree作为etree
进口SYSDOC =调用etree.parse(sys.stdin)
高清西梅(父):
ever_changed =假
而真正的:
改变=假
对于EL在parent.getchildren():
如果len(el.getchildren())== 0:
如果((el.text是无或el.text.strip()=='')和
(el.tail是无或el.tail.strip()=='')):
parent.remove(EL)
改变=真
其他:
改变=改变或修剪(EL)
ever_changed =改变或ever_changed
如果换成是假:
返回ever_changed修剪(doc.getroot())
打印etree.tostring(doc.getroot())
I need some help a couple of questions, using bash tools
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
<Attributes></Attributes>
<ChargeArea></ChargeArea>
</CreateOfficeCode>
to become:
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
</CreateOfficeCode>
for this I have done so by this command
sed -i '/><\//d' file
which is not so strict, its more like a trick, something more appropriate would be to find the <pattern></pattern>
and remove it. Suggestion?
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
</CreateOfficeCode>
</CreateOfficeGroup>
to:
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
</CreateOfficeGroup>
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
<Attributes></Attributes>
<ChargeArea></ChargeArea>
</CreateOfficeCode>
<CreateOfficeSize>
<Chairs></Chairs>
<Tables></Tables>
</CreateOfficeSize>
</CreateOfficeGroup>
to:
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
</CreateOfficeCode>
</CreateOfficeGroup>
Can you answer the questions as individuals? Thank you very much!
XMLStarlet is a command-line XML processor. Doing what you want with it is a one-line operation (until the desired recursive behavior is added), and will work for all variants of XML syntax describing the same input:
The simple version:
xmlstarlet ed \
-d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
input.xml
The fancy version:
strip_recursively() {
local doc last_doc
IFS= read -r -d '' doc
while :; do
last_doc=$doc
doc=$(xmlstarlet ed \
-d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
/dev/stdin <<<"$last_doc")
if [[ $doc = "$last_doc" ]]; then
printf '%s\n' "$doc"
return
fi
done
}
strip_recursively <input.xml
/dev/stdin
is used rather than -
(at some cost to platform portability) for better portability across releases of XMLStarlet; adjust to taste.
With a system having only older dependencies installed, a more likely XML parser to have installed is that bundled with Python.
#!/usr/bin/env python
import xml.etree.ElementTree as etree
import sys
doc = etree.parse(sys.stdin)
def prune(parent):
ever_changed = False
while True:
changed = False
for el in parent.getchildren():
if len(el.getchildren()) == 0:
if ((el.text is None or el.text.strip() == '') and
(el.tail is None or el.tail.strip() == '')):
parent.remove(el)
changed = True
else:
changed = changed or prune(el)
ever_changed = changed or ever_changed
if changed is False:
return ever_changed
prune(doc.getroot())
print etree.tostring(doc.getroot())