且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

调整CSV数据:将单元格追加到上一行,合并包含某些字符串的单元格

更新时间:2023-02-04 19:28:29

只需使用csv模块读取数据,按行对数据进行处理,然后再次将其写出即可.

Just use the csv module to read the data, massage this per row, and write it out again.

您可以使用None或空字符串''作为该列的值来创建空"列.反之亦然,读取空列(因此在连续的选项卡之间)将为您提供空字符串.

You can create 'empty' columns by using None or an empty string '' as the value for that column. Vice-versa, reading empty columns (so between consecutive tabs) gives you empty strings.

with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    writer = csv.writer(outfile, delimiter='\t')

    for row in reader:
        if len(row) > 3:
            # detect if `c` is missing (insert your own test here)
            # sample test looks for 3 consecutive columns with values f, o and o
            if row[3:6] == ['f', 'o', 'o']
                # insert an empty `c`
                row.insert(3, '')

        if len(row) < 5:
            # make row at least 5 columns long
            row.extend([''] * (5 - len(row)))
        if len(row) > 5:
            # merge any excess columns into the 5th column
            row[4] = ','.join(row[4:])
            del row[5:]

        writer.writerow(row)

更新:

使用阅读器作为迭代器(而不是使用标志)(在其上调用next()以获得下一行而不是使用for循环):

Instead of using a flag, use the reader as an iterator (calling next() on it to get the next row instead of using a for loop):

with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    writer = csv.writer(outfile, delimiter='\t')

    row = None

    try:
        next(reader)  # skip the `A   B` headers.

        line = next(reader)  # prime our loop
        while True:
            while not line[0]:
                # advance to the first line with a column 0 value
                line = next(reader)

            row = line  # start off with the first number and column
            line = next(reader)  # prime the subsequent lines loop

            while line and not line[0]:
                # process subsequent lines until we find one with a value in col 0 again
                cell = line[1]
                if cell == 'foo':    # detect column d
                    row.append('')   # and insert empty value
                row.append(cell)
                line = next(reader)

            # consolidate, write
            if len(row) < 5:
                # make row at least 5 columns long
                row.extend([''] * (5 - len(row)))
            if len(row) > 5:
                # merge any excess columns into the 5th column
                row[4] = ','.join(row[4:])
                del row[5:]

            writer.writerow(row)
            row = None
    except StopIteration:
        # reader is done, no more lines to come
        # process the last row if there was one
        if row is not None:
            # consolidate, write
            if len(row) < 5:
                # make row at least 5 columns long
                row.extend([''] * (5 - len(row)))
            if len(row) > 5:
                # merge any excess columns into the 5th column
                row[4] = ','.join(row[4:])
                del row[5:]

            writer.writerow(row)