且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用Python合并两个CSV文件

更新时间:2023-02-12 20:34:53

在处理csv文件时,我经常使用 pandas 库.它使这样的事情变得非常容易.例如:

When I'm working with csv files, I often use the pandas library. It makes things like this very easy. For example:

import pandas as pd

a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)


一些解释如下.首先,我们读取csv文件:


Some explanation follows. First, we read in the csv files:

>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
   title  stage    jan    feb
0   darn  3.001  0.421  0.532
1     ok  2.829  1.036  0.751
2  three  1.115  1.146  2.921
>>> b
   title    mar    apr    may       jun  Unnamed: 5
0   darn  0.631  1.321  0.951    1.7510         NaN
1     ok  1.001  0.247  2.456    0.3216         NaN
2  three  0.285  1.283  0.924  956.0000         NaN

,我们看到有一列额外的数据(请注意,fileb.csv的第一行-title,mar,apr,may,jun,的末尾有一个逗号).我们可以很容易地摆脱它:

and we see there's an extra column of data (note that the first line of fileb.csv -- title,mar,apr,may,jun, -- has an extra comma at the end). We can get rid of that easily enough:

>>> b = b.dropna(axis=1)
>>> b
   title    mar    apr    may       jun
0   darn  0.631  1.321  0.951    1.7510
1     ok  1.001  0.247  2.456    0.3216
2  three  0.285  1.283  0.924  956.0000

现在,我们可以在标题列上合并ab:

Now we can merge a and b on the title column:

>>> merged = a.merge(b, on='title')
>>> merged
   title  stage    jan    feb    mar    apr    may       jun
0   darn  3.001  0.421  0.532  0.631  1.321  0.951    1.7510
1     ok  2.829  1.036  0.751  1.001  0.247  2.456    0.3216
2  three  1.115  1.146  2.921  0.285  1.283  0.924  956.0000

最后写出来:

>>> merged.to_csv("output.csv", index=False)

生产:

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0