且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

获取稀疏矩阵中每一行的前n个项

更新时间:2022-12-10 21:15:53

链接答案有什么问题?在您的情况下不起作用吗?还是您不了解?还是效率不够?

What is wrong with the linked answer? Does it not work in your case? or you just don't understand it? Or it isn't efficient enough?

我将建议找出一种方法来找到lil格式矩阵的一行的最高值,并逐行应用该方法.但是我只想重复我以前的答案.

I was going to suggest working out a means of finding the top values for a row of an lil format matrix, and apply that row by row. But I would just be repeating my earlier answer.

好的,我的上一个答案是一个开始,但是缺少有关遍历lol格式的一些详细信息.这是一个开始;它可能可以清理.

OK, my previous answer was a start, but lacked some details on iterating through the lol format. Here's a start; it probably could be cleaned up.

制作数组和lil版本:

In [42]: arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]])    
In [43]: arr_sp=sparse.csc_matrix(arr)
In [44]: arr_ll=arr_sp.tolil()

上一个答案中的行函数:

The row function from the previous answer:

def max_n(row_data, row_indices, n):
        i = row_data.argsort()[-n:]
        # i = row_data.argpartition(-n)[-n:]
        top_values = row_data[i]
        top_indices = row_indices[i]  # do the sparse indices matter?
        return top_values, top_indices, i

遍历arr_ll的行,应用此函数并替换元素:

Iterate over the rows of arr_ll, apply this function and replace the elements:

In [46]: for i in range(arr_ll.shape[0]):
    d,r=max_n(np.array(arr_ll.data[i]),np.array(arr_ll.rows[i]),2)[:2]
    arr_ll.data[i]=d.tolist()
    arr_ll.rows[i]=r.tolist()
   ....:     

In [47]: arr_ll.data
Out[47]: array([[3, 5], [6, 9], [6, 8]], dtype=object)

In [48]: arr_ll.rows
Out[48]: array([[2, 1], [0, 3], [3, 4]], dtype=object)

In [49]: arr_ll.tocsc().A
Out[49]: 
array([[0, 5, 3, 0, 0],
       [6, 0, 0, 9, 0],
       [0, 0, 0, 6, 8]])

lil格式,数据存储在2个对象类型数组中,作为子列表,一个带有数据编号,另一个带有列索引.

In the lil format, the data is stored in 2 object type arrays, as sublists, one with the data numbers, the other with the column indices.

在做新事情时,查看稀疏矩阵的数据属性非常方便.更改这些属性会带来一定的风险,因为它会使整个数组混乱.但是看来可以像这样安全地调整lil格式.

Viewing the data attributes of sparse matrix is handy when doing new things. Changing those attributes has some risk, since it mess up the whole array. But it looks like the lil format can be tweaked like this safely.

csc相比,csr格式更适合访问行.它的数据存储在3个数组中,分别为dataindicesindptr. lil格式根据indptr中的信息有效地将这些数组中的2个拆分为子列表. csr非常适合数学(乘法,加法等),但在更改稀疏度(将非零值转换为零)时效果不佳.

The csr format is better for accessing rows than csc. It's data is stored in 3 arrays, data, indices and indptr. The lil format effectively splits 2 of those arrays into sublists based on information in the indptr. csr is great for math (multiplication, addition etc), but not so good when changing the sparsity (turning nonzero values into zeros).