且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

读取多个 csv 文件并将文件名添加为 pandas 中的新列

更新时间:2021-10-26 08:53:45

我觉得你需要assign 用于在 loop 中添加新列,参数 ignore_index=True 也被添加到 concat 用于删除重复项索引:

I think you need assign for add new column in loop, also parameter ignore_index=True was added to concat for remove duplicates in index:

测试文件是 a.csv, b.csv, c.csv.

Files for test are a.csv, b.csv, c.csv.

import pandas as pd
import glob, os


files = glob.glob('samples_for_so/*.csv')
print (files)
#['samples_for_so\a.csv', 'samples_for_so\b.csv', 'samples_for_so\c.csv']


df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp)) for fp in files])
print (df)
   a  b  c  d    New
0  0  1  2  5  a.csv
1  1  5  8  3  a.csv
0  0  9  6  5  b.csv
1  1  6  4  2  b.csv
0  0  7  1  7  c.csv
1  1  3  2  6  c.csv


files = glob.glob('samples_for_so/*.csv')
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp).split('.')[0]) 
       for fp in files])
print (df)
   a  b  c  d New
0  0  1  2  5   a
1  1  5  8  3   a
2  0  9  6  5   b
3  1  6  4  2   b
4  0  7  1  7   c
5  1  3  2  6   c