且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用python从CSV文件中删除特殊字符

更新时间:2022-11-22 11:54:07

我可能会做类似的事情

import csv

with open("special.csv", "rb") as infile, open("repaired.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    conversion = set('_"/.$')
    for row in reader:
        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
        writer.writerow(newrow)

变成

$ cat special.csv
th$s,2.3/,will-be
fixed.,even.though,maybe
some,"shoul""dn't",be

(请注意,我有一个带引号的值)

(note that I have a quoted value) into

$ cat repaired.csv 
th_s,2_3_,will-be
fixed_,even_though,maybe
some,shoul_dn't,be


现在,您的代码将整个文本读入一行:


Right now, your code is reading in the entire text into one big line:

text =  input.read()

_字符开始:

newtext = '_'

遍历text中的每个单个字符:

Looping over every single character in text:

for c in text:

将已更正的字符添加到newtext(非常缓慢):

Add the corrected character to newtext (very slowly):

    newtext += '_' if c in conversion else c

然后将原始字符(?)作为一栏写入新的csv:

And then write the original character (?), as a column, to a new csv:

    writer.writerow(c)

..这不太可能是您想要的. :^)

.. which is unlikely to be what you want. :^)