基于多个条件通过python或R脚本删除或删除意外的记录和字符串

更新时间：2023-12-05 11:23:40

这可以通过以下Python脚本实现：

  import csv 
 import re 
 import string 

 output_header = ['a_id'，'b_id '，'CC'，'DD'，'EE'，'FF'，'GG'] 

 sanitise_table = string.maketrans（，）
 nodigits_table = sanitise_table。翻译（sanitise_table，string.digits）

 def sanitise_cell（cell）：
 return cell.translate（sanitise_table，nodigits_table）＃保持数字

 with open fileOne.csv'）as f_input，open（'resultFile.csv'，'wb'）as f_output：
 csv_input = csv.reader（f_input）
 csv_output = csv.writer（f_output）

 input_header = next（f_input）
 csv_output.writerow（output_header）

 csv_input中的行：
 bb = re.match（r'（\d + ）_（\ d +）\.csv'，row [1]）$ b 
 $ b如果bb和row [2]不在['No Bi'，'less']：
 ＃删除'Mi'后的所有列
 try：
 mi = row.index（'Mi'）
 row [：] = row [：mi] + [''] * （len（row） -  mi）
，除了ValueError：
 pass 

 row [：] = [san in row_col ] = bb.group（1）
 row [1] = bb.group（2）
 csv_output.writerow（row）

要从现有文件中简单删除列 c>，可以使用以下命令：

  import csv 

 with open（'input.csv'）as f_input，open（'output.csv'，'wb'）as f_output ：
 csv_input = csv.reader（f_input）
 csv_output = csv.writer（f_output）

 csv_input中的行：
 try：
 mi = row.index（'Mi'）
 row [：] = row [：mi] + [''] *（len（row） -  mi）
 ValueError：
 pass 

 csv_output.writerow（row）

使用Python 2.7.9测试

I have a .csv file named fileOne.csv that contains many unnecessary strings and records. I want to delete unnecessary records / rows and strings based on multiple condition / criteria using a Python or R script and save the records into a new .csv file named resultFile.csv.

What I want to do is as follows:

Delete the first column.
Split column BB into two column named as a_id, and c_id. Separate the value by _ (underscore) and left side will go to a_id, and right side will go to c_id.
Keep only records that have the .csv file extension in the files column, but do not contain No Bi in cut column.
Assign new name to each of the columns.
Delete the records that contain strings like less in the CC column.
Trim all other unnecessary string from the records.
Delete the reamining filds of each rows after I find the "Mi" in each rows.

My fileOne.csv is as follows:

   AA      BB       CC       DD     EE      FF    GG
   1       1_1.csv  (=0      =10"   27"     =57   "Mi"
   0.97    0.9      0.8      NaN    0.9     od    0.2
   2       1_3.csv  (=0      =10"   27"     "Mi"  0.5
   0.97    0.5      0.8      NaN    0.9     od    0.4
   3       1_6.csv  (=0      =10"   "Mi"     =53  cnt
   0.97    0.9      0.8      NaN    0.9     od    0.6
   4       2_6.csv  No Bi    000    000     000   000
   5       2_8.csv  No Bi    000    000     000   000
   6       6_9.csv  less     000    000     000   000
   7       7_9.csv  s(=0     =26"   =46"    "Mi"  121

My 1st expected results files would be as follows:

a_id    b_id    CC    DD    EE    FF    GG             
1       1       0     10    27    57    Mi              
1       3       0     10    27    Mi    0.5
1       6       0     10    Mi    53    cnt 
7       9       0     26    46    Mi    121

My final expected results files would be as follows:

a_id    b_id    CC    DD    EE    FF    GG             
1       1       0     10    27    57              
1       3       0     10    27
1       6       0     10 
7       9       0     26    46

This can be achieved with the following Python script:

import csv
import re
import string

output_header = ['a_id', 'b_id', 'CC', 'DD', 'EE', 'FF', 'GG']

sanitise_table = string.maketrans("","")
nodigits_table = sanitise_table.translate(sanitise_table, string.digits)

def sanitise_cell(cell):
    return cell.translate(sanitise_table, nodigits_table)       # Keep digits

with open('fileOne.csv') as f_input, open('resultFile.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    input_header = next(f_input)
    csv_output.writerow(output_header)

    for row in csv_input:
        bb = re.match(r'(\d+)_(\d+)\.csv', row[1])

        if bb and row[2] not in ['No Bi', 'less']:
            # Remove all columns after 'Mi' if present
            try:
                mi = row.index('Mi')
                row[:] = row[:mi] + [''] * (len(row) - mi)
            except ValueError:
                pass

            row[:] = [sanitise_cell(col) for col in row]
            row[0] = bb.group(1)
            row[1] = bb.group(2)
            csv_output.writerow(row)

To simply remove Mi columns from an existing file the following can be used:

import csv

with open('input.csv') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    for row in csv_input:
        try:
            mi = row.index('Mi')
            row[:] = row[:mi] + [''] * (len(row) - mi)
        except ValueError:
            pass

        csv_output.writerow(row)

Tested using Python 2.7.9

上一篇 : ：怎样才能在C＃中的随机字母数字串？下一篇 : 如何在Swift中加载GIF图像？

基于多个条件通过python或R脚本删除或删除意外的记录和字符串

相关阅读

推荐文章