从 pandas 的字符串列中删除非ASCII字符

更新时间：2023-12-03 19:16:28

通常，要删除非ASCII字符，请使用带有错误='ignore'的str.encode:

In general, to remove non-ascii characters, use str.encode with errors='ignore':

df['col'] = df['col'].str.encode('ascii', 'ignore').str.decode('ascii')

要在多个字符串列上执行此操作，请使用

To perform this on multiple string columns, use

u = df.select_dtypes(object)
df[u.columns] = u.apply(
    lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))

尽管那样仍然无法处理您列中的空字符.为此，您可以使用正则表达式替换它们:

Although that still won't handle the null characters in your columns. For that, you replace them using regex:

df2 = df.replace(r'\W+', '', regex=True)

相关阅读