且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

ElementTree不会使用Python 2.7解析特殊字符

更新时间:2022-11-26 12:06:27

您碰到两个不同的地方同时在Python 2和Python 3之间切换s,这就是为什么您得到意想不到的结果的原因。

You're running into two separate differences between Python 2 and Python 3 at the same time, which is why you're getting unexpected results.

第一个区别是您可能已经意识到: Python在第2版中的print语句在第3版中成为打印功能。这种更改在您的情况下造成了一种特殊情况,我将在稍后介绍。但是简单来说,这就是'print'的工作方式的差异:

The first difference is one you're probably already aware of: Python's print statement in version 2 became a print function in version 3. That change is creating a special circumstance in your case, which I'll get to a little later. But briefly, this is the difference in how 'print' works:

在Python 3中:

In Python 3:

>>> # Two arguments 'Hi' and 'there' get passed to the function 'print'.
>>> # They are concatenated with a space separator and printed.
>>> print('Hi', 'there') 
>>> Hi there

在Python 2中:

In Python 2:

>>> # 'print' is a statement which doesn't need parenthesis.
>>> # The parenthesis instead create a tuple containing two elements 
>>> # 'Hi' and 'there'. This tuple is then printed.
>>> print('Hi', 'there')
>>> ('Hi', 'there')

您的第二个问题是元组通过在每个元素上调用repr()。在Python 3中,repr()根据需要显示unicode。但是在Python 2中,repr()对所有超出可打印ASCII范围(例如,大于127)的字节值使用转义字符。这就是为什么看到它们。

The second problem in your case is that tuples print themselves by calling repr() on each of their elements. In Python 3, repr() displays unicode as you want. But in Python 2, repr() uses escape characters for any byte values which fall outside the printable ASCII range (e.g., larger than 127). This is why you're seeing them.

您可以决定是否解决此问题,这取决于您的目标代码。 Python 2中的元组表示形式使用转义字符,因为它并非旨在显示给最终用户。这更多的是为您提供开发人员内部便利,故障排除和类似任务。如果您只是为自己打印,那么您可能不需要更改任何内容,因为Python向您显示了该非ASCII字符的编码字节正确存在于字符串中。如果您确实想向最终用户显示具有元组外观格式的内容,那么一种方法(保留正确的unicode打印)是手动创建格式,例如:

You may decide to resolve this issue, or not, depending on what you're goal is with your code. The representation of a tuple in Python 2 uses escape characters because it's not designed to be displayed to an end-user. It's more for your internal convenience as a developer, for troubleshooting and similar tasks. If you're simply printing it for yourself, then you may not need to change a thing because Python is showing you that the encoded bytes for that non-ASCII character are correctly there in your string. If you do want to display something to the end-user which has the format of how tuples look, then one way to do it (which retains correct printing of unicode) is to manually create the formatting, like this:

def printAccountPlan(xmltree):
    data = (i.attrib['number'], i.attrib['type'], i.text)
    print "('account:', '%s', 'AccountType:', '%s', 'Name:', '%s')" % data
# Produces this:
# ('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')