Python Pandas替换特殊字符

更新时间：2023-12-05 13:12:52

我假设您在此处使用的是Python 2.x，这很可能是Unicode问题.不用担心，您并不孤单-unicode通常非常困难，尤其是在Python 2中，这就是为什么它已在Python 3中成为标准的原因.

I'm assuming you're using Python 2.x here and this is likely a Unicode problem. Don't worry, you're not alone--unicode is really tough in general and especially in Python 2, which is why it's been made standard in Python 3.

如果您只关心ñ，则应使用UTF-8解码，然后只需替换一个字符即可.

If all you're concerned about is the ñ, you should decode in UTF-8, and then just replace the one character.

这看起来类似于以下内容:

That would look something like the following:

DF['name'] = DF['name'].str.decode('utf-8').replace(u'\xf1', 'n')

例如:

>>> "sureño".decode("utf-8").replace(u"\xf1", "n")
u'sureno'

如果您的字符串已经是Unicode，那么您可以(实际上必须)跳过decode步骤:

If your string is already Unicode, then you can (and actually have to) skip the decode step:

>>> u"sureño".replace(u"\xf1", "n")
u'sureno'

请注意，u'\xf1'使用十六进制转义表示有问题的角色.

Note here that u'\xf1' uses the hex escape for the character in question.

我在评论中被告知<>.str.replace是熊猫系列方法，但我没有意识到.答案可能类似于以下内容:

I was informed in the comments that <>.str.replace is a pandas series method, which I hadn't realized. The answer to this possibly might be something like the following:

DF['name'] = map(lambda x: x.decode('utf-8').replace(u'\xf1', 'n'), DF['name'].str)

或类似的东西(如果该熊猫对象是可迭代的).

or something along those lines, if that pandas object is iterable.

实际上，我突然发现您的问题可能很简单，如下所示:

It actually just occurred to me that your issue may be as simple as the following:

DF['NAME']=DF['NAME'].str.replace(u"ñ","n")

请注意我是如何在字符串前面添加u使其成为unicode的.

Note how I've added the u in front of the string to make it unicode.

上一篇 : ：此提交未通过认证 - 认证报告在何处下一篇 : 用Java脚本中的双引号嵌套替换字符串

Python Pandas替换特殊字符

相关阅读

推荐文章