且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

h5py:如何读取hdf5文件的选定行?

更新时间:2021-11-11 22:39:56

我有一个样本h5py文件,其内容为:

I have a sample h5py file with:

data = f['data']
#  <HDF5 dataset "data": shape (3, 6), type "<i4">
# is arange(18).reshape(3,6)
ind=np.where(data[:]%2)[0]
# array([0, 0, 0, 1, 1, 1, 2, 2, 2], dtype=int32)
data[ind]  # getitem only works with boolean arrays error
data[ind.tolist()] # can't read data (Dataset: Read failed) error

最后一个错误是由列表中的重复值引起的.

This last error is caused by repeated values in the list.

但是使用具有唯一值的列表建立索引会很好

But indexing with lists with unique values works fine

In [150]: data[[0,2]]
Out[150]: 
array([[ 0,  1,  2,  3,  4,  5],
       [12, 13, 14, 15, 16, 17]])

In [151]: data[:,[0,3,5]]
Out[151]: 
array([[ 0,  3,  5],
       [ 6,  9, 11],
       [12, 15, 17]])

具有适当尺寸切片的数组也是如此:

So does an array with the proper dimension slicing:

In [157]: data[ind[[0,3,6]],:]
Out[157]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])
In [165]: f['data'][:2,np.array([0,3,5])]
Out[165]: 
array([[ 0,  3,  5],
       [ 6,  9, 11]])
In [166]: f['data'][[0,1],np.array([0,3,5])]  
# errror about only one indexing array allowed

因此,如果索引正确-唯一值,并且与数组尺寸匹配,它应该可以工作.

So if the indexing is right - unique values, and matching the array dimensions, it should work.

我的简单示例未测试要加载多少数组.文档听起来好像是从文件中选择了元素,而没有将整个数组加载到内存中.

My simple example doesn't test how much of the array is loaded. The documentation sounds as though elements are selected from the file without loading the whole array into memory.