且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从 pandas 的字符串列中删除非ASCII字符

更新时间:2023-12-03 19:16:28

通常,要删除非ASCII字符,请使用带有错误='ignore'的str.encode:

In general, to remove non-ascii characters, use str.encode with errors='ignore':

df['col'] = df['col'].str.encode('ascii', 'ignore').str.decode('ascii')

要在多个字符串列上执行此操作,请使用

To perform this on multiple string columns, use

u = df.select_dtypes(object)
df[u.columns] = u.apply(
    lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))

尽管那样仍然无法处理您列中的空字符.为此,您可以使用正则表达式替换它们:

Although that still won't handle the null characters in your columns. For that, you replace them using regex:

df2 = df.replace(r'\W+', '', regex=True)