更新时间:2023-12-01 12:19:34
complete docs are here, and some cookbook strategies here
PyTables是面向行的,所以你只能追加行。读取csv chunk-by-chunk,然后在你去时添加整个框架,如下:
PyTables is row-oriented, so you can only append rows. Read the csv chunk-by-chunk then append the entire frame as you go, something like this:
store = pd.HDFStore('file.h5',mode='w')
for chunk in read_csv('file.csv',chunksize=50000):
store.append('df',chunk)
store.close()
你必须小心,因为它可能是当读取chunk-by-chunk以具有不同的dtypes时,例如你有一个整数,像没有缺少值的列,直到说第二个块。第一个块将该列作为 int64
,第二个作为 float64
。您可能需要使用 dtype
关键字将dtypes强制为 read_csv
,请参阅此处。
You must be a tad careful as it is possiible for the dtypes of the resultant frrame when read chunk-by-chunk to have different dtypes, e.g. you have a integer like column that doesn't have missing values until say the 2nd chunk. The first chunk would have that column as an int64
, while the second as float64
. You may need to force dtypes with the dtype
keyword to read_csv
, see here.
这里也是一个类似的问题。
here is a similar question as well.