且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

python/scipy/numpy中的有效增量式稀疏矩阵

更新时间:2021-09-16 01:48:25

https://***.com/a/27771335/901925 我探索了增量矩阵分配.

In https://***.com/a/27771335/901925 I explore incremental matrix assignment.

loldok格式. csr会向您发出效率警告,并且coo不允许编制索引.

lol and dok are the recommended formats if you want to change values. csr will give you an efficiency warning, and coo does not allow indexing.

但是我也发现dok索引比常规字典索引慢.因此,对于许多更改,***构建一个普通的字典(具有相同的元组索引),然后从中构建dok矩阵.

But I also found that dok indexing is slow compared to regular dictionary indexing. So for many changes it is better to build a plain dictionary (with the same tuple indexing), and build the dok matrix from that.

但是,如果您可以使用快速的numpy向量运算来计算H数据值,而不是进行迭代,则***这样做,并从中构造稀疏矩阵(例如coo格式).实际上,即使进行迭代,它也会更快:

But if you can calculate the H data values with a fast numpy vector operation, as opposed to iteration, it is best to do so, and construct the sparse matrix from that (e.g. coo format). In fact even with iteration this would be faster:

 h = np.zeros(A.shape)
 for k, (i,j) in enumerate(zip(A,B)):
    h[k] = compute_something 
 H = sparse.coo_matrix((h, (A, B)), shape=(n,m))

例如

In [780]: A=np.array([0,1,1,2]); B=np.array([0,2,2,1])
In [781]: h=np.zeros(A.shape)
In [782]: for k, (i,j) in enumerate(zip(A,B)):
    h[k] = i+j+k
   .....:     
In [783]: h
Out[783]: array([ 0.,  4.,  5.,  6.])
In [784]: M=sparse.coo_matrix((h,(A,B)),shape=(4,4))
In [785]: M
Out[785]: 
<4x4 sparse matrix of type '<class 'numpy.float64'>'
    with 4 stored elements in COOrdinate format>
In [786]: M.A
Out[786]: 
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  9.,  0.],
       [ 0.,  6.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

请注意,(1,2)值为4 + 5之和.这是coocsr转换的一部分.

Note that the (1,2) value is the sum 4+5. That's part of the coo to csr conversion.

在这种情况下,我本可以使用以下公式计算h:

In this case I could have calculated h with:

In [791]: A+B+np.arange(A.shape[0])
Out[791]: array([0, 4, 5, 6])

因此不需要迭代.