且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将列添加到Pandas中HDF文件的框架

更新时间:2023-12-01 12:19:34

完整文档是此处,有些食谱策略此处

complete docs are here, and some cookbook strategies here

PyTables是面向行的,所以你只能追加行。读取csv chunk-by-chunk,然后在你去时添加整个框架,如下:

PyTables is row-oriented, so you can only append rows. Read the csv chunk-by-chunk then append the entire frame as you go, something like this:

store = pd.HDFStore('file.h5',mode='w')
for chunk in read_csv('file.csv',chunksize=50000):
         store.append('df',chunk)
store.close()

你必须小心,因为它可能是当读取chunk-by-chunk以具​​有不同的dtypes时,例如你有一个整数,像没有缺少值的列,直到说第二个块。第一个块将该列作为 int64 ,第二个作为 float64 。您可能需要使用 dtype 关键字将dtypes强制为 read_csv ,请参阅此处

You must be a tad careful as it is possiible for the dtypes of the resultant frrame when read chunk-by-chunk to have different dtypes, e.g. you have a integer like column that doesn't have missing values until say the 2nd chunk. The first chunk would have that column as an int64, while the second as float64. You may need to force dtypes with the dtype keyword to read_csv, see here.

这里也是一个类似的问题。

here is a similar question as well.