且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

打印字符串会在 Python 中的字符串之前打印 'u' 吗?

更新时间:2023-12-04 11:52:40

我认为您实际上对此感到惊讶的是,打印单个字符串与打印字符串列表的作用不同——这是不管它们是否是 Unicode 都是真的:

>>>hobby1 = u'头晕'>>>爱好 2 = u'Vértigo'>>>爱好 = [爱好 1,爱好 2]>>>打印爱好1头晕>>>印刷爱好[u'头晕', u'V\xe9rtigo']

即使没有 u,你也有那些额外的引号,更不用说反斜杠转义了.如果你用 str 字节字符串代替 unicode 字符串尝试同样的事情,你仍然会有引号和转义符(如果你的源文件你可能有 mojibake 字符并且你的终端有不同的编码......但忘记那部分).

在 Python 中,每个对象都可以有两种不同的表示:最终用户友好的表示,str,和程序员友好的表示,repr.对于字节字符串,这些表示分别是 Painting'Painting'.对于 Unicode 字符串,它们是 Paintingu'Painting'.

print 语句使用 str,所以 print hobby1 打印出 Painting,没有引号(或u,如果是 Unicode).

然而,列表的 str 使用其每个元素的 repr,而不是 str.因此,当您打印 hobbies 时,每个元素周围都有引号(如果是 Unicode,则带有 u).

这起初可能看起来很奇怪,但这是一个有意的设计决定,一旦你习惯了它就有意义.打印出 [foo, bar, baz] 会产生歧义——是三个字符串的列表,还是两个字符串的列表,其中一个字符串中间有一个逗号?但是,更重要的是,列表已经不是一个用户友好的东西,无论你如何打印出来.我的爱好是[绘画,观星] 看起来和我的爱好是['绘画','观星'] 一样丑陋.当您想向最终用户显示列表时,您总是希望以某种有意义的方式明确地格式化它.

通常,您想要的就这么简单:

>>>打印'爱好:',','.join(爱好)兴趣:绘画、观星

或者,对于 Unicode 字符串:

>>>打印 u'Hobbies:', u', '.join(hobbies)兴趣:绘画、观星

'u' before elements in printed list? I didn't type u in my code.

hobbies = []

#prompt user three times for hobbies
for i in range(3):
    hobby = raw_input('Enter a hobby:')
    hobbies.append(hobby)

#print list stored in hobbies
print hobbies

When I run this, it prints the list but it is formatted like this:

Enter a hobby: Painting
Enter a hobby: Stargazing
Enter a hobby: Reading
[u'Painting', u'Stargazing', u'Reading']
None

Where did those 'u' come from before each of the elements of the list?

I think what you're actually surprised by here is that printing a single string doesn't do the same thing as printing a list of strings—and this is true whether they're Unicode or not:

>>> hobby1 = u'Dizziness'
>>> hobby2 = u'Vértigo'
>>> hobbies = [hobby1, hobby2]
>>> print hobby1
Dizziness
>>> print hobbies
[u'Dizziness', u'V\xe9rtigo']

Even without the u, you've got those extra quotes, not to mention that backslash escape. And if you try the same thing with str byte strings instead of unicode strings, you'll still have the quotes and escapes (plus you might have mojibake characters if your source file and your terminal have different encodings… but forget that part).


In Python, every object can have two different representations: the end-user-friendly representation, str, and the programmer-friendly representation, repr. For byte strings, those representations are Painting and 'Painting', respectively. And for Unicode strings, they're Painting and u'Painting'.

The print statement uses the str, so print hobby1 prints out Painting, with no quotes (or u, if it's Unicode).

However, the str of a list uses the repr of each of its elements, not the str. So, when you print hobbies, each element has quotes around it (and a u if it's Unicode).

This may seem weird at first, but it's an intentional design decision, and it makes sense once you get used to it. And it would be ambiguous to print out [foo, bar, baz]—is that a list of three strings, or a list of two strings, one of which has a comma in the middle of it? But, more importantly, a list is already not a user-friendly thing, no matter how you print it out. My hobbies are [Painting, Stargazing] would look just as ugly as My hobbies are ['Painting', 'Stargazing']. When you want to show a list to an end-user, you always want to format it explicitly in some way that makes sense.

Often, what you want is as simple as this:

>>> print 'Hobbies:', ', '.join(hobbies)
Hobbies: Painting, Stargazing

Or, for Unicode strings:

>>> print u'Hobbies:', u', '.join(hobbies)
Hobbies: Painting, Stargazing