更新时间:2022-12-10 21:15:53
链接答案有什么问题?在您的情况下不起作用吗?还是您不了解?还是效率不够?
What is wrong with the linked answer? Does it not work in your case? or you just don't understand it? Or it isn't efficient enough?
我将建议找出一种方法来找到lil
格式矩阵的一行的最高值,并逐行应用该方法.但是我只想重复我以前的答案.
I was going to suggest working out a means of finding the top values for a row of an lil
format matrix, and apply that row by row. But I would just be repeating my earlier answer.
好的,我的上一个答案是一个开始,但是缺少有关遍历lol
格式的一些详细信息.这是一个开始;它可能可以清理.
OK, my previous answer was a start, but lacked some details on iterating through the lol
format. Here's a start; it probably could be cleaned up.
制作数组和lil
版本:
In [42]: arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]])
In [43]: arr_sp=sparse.csc_matrix(arr)
In [44]: arr_ll=arr_sp.tolil()
上一个答案中的行函数:
The row function from the previous answer:
def max_n(row_data, row_indices, n):
i = row_data.argsort()[-n:]
# i = row_data.argpartition(-n)[-n:]
top_values = row_data[i]
top_indices = row_indices[i] # do the sparse indices matter?
return top_values, top_indices, i
遍历arr_ll
的行,应用此函数并替换元素:
Iterate over the rows of arr_ll
, apply this function and replace the elements:
In [46]: for i in range(arr_ll.shape[0]):
d,r=max_n(np.array(arr_ll.data[i]),np.array(arr_ll.rows[i]),2)[:2]
arr_ll.data[i]=d.tolist()
arr_ll.rows[i]=r.tolist()
....:
In [47]: arr_ll.data
Out[47]: array([[3, 5], [6, 9], [6, 8]], dtype=object)
In [48]: arr_ll.rows
Out[48]: array([[2, 1], [0, 3], [3, 4]], dtype=object)
In [49]: arr_ll.tocsc().A
Out[49]:
array([[0, 5, 3, 0, 0],
[6, 0, 0, 9, 0],
[0, 0, 0, 6, 8]])
以lil
格式,数据存储在2个对象类型数组中,作为子列表,一个带有数据编号,另一个带有列索引.
In the lil
format, the data is stored in 2 object type arrays, as sublists, one with the data numbers, the other with the column indices.
在做新事情时,查看稀疏矩阵的数据属性非常方便.更改这些属性会带来一定的风险,因为它会使整个数组混乱.但是看来可以像这样安全地调整lil
格式.
Viewing the data attributes of sparse matrix is handy when doing new things. Changing those attributes has some risk, since it mess up the whole array. But it looks like the lil
format can be tweaked like this safely.
与csc
相比,csr
格式更适合访问行.它的数据存储在3个数组中,分别为data
,indices
和indptr
. lil
格式根据indptr
中的信息有效地将这些数组中的2个拆分为子列表. csr
非常适合数学(乘法,加法等),但在更改稀疏度(将非零值转换为零)时效果不佳.
The csr
format is better for accessing rows than csc
. It's data is stored in 3 arrays, data
, indices
and indptr
. The lil
format effectively splits 2 of those arrays into sublists based on information in the indptr
. csr
is great for math (multiplication, addition etc), but not so good when changing the sparsity (turning nonzero values into zeros).