更新时间:2023-12-03 19:16:28
通常,要删除非ASCII字符,请使用带有错误='ignore'的str.encode
:
In general, to remove non-ascii characters, use str.encode
with errors='ignore':
df['col'] = df['col'].str.encode('ascii', 'ignore').str.decode('ascii')
要在多个字符串列上执行此操作,请使用
To perform this on multiple string columns, use
u = df.select_dtypes(object)
df[u.columns] = u.apply(
lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))
尽管那样仍然无法处理您列中的空字符.为此,您可以使用正则表达式替换它们:
Although that still won't handle the null characters in your columns. For that, you replace them using regex:
df2 = df.replace(r'\W+', '', regex=True)