更新时间:2022-01-11 09:04:53
您可以直接使用sklearn在稀疏矩阵的行上计算成对的余弦相似度.从0.17版开始,它还支持稀疏输出:
You can compute pairwise cosine similarity on the rows of a sparse matrix directly using sklearn. As of version 0.17 it also supports sparse output:
from sklearn.metrics.pairwise import cosine_similarity
from scipy import sparse
A = np.array([[0, 1, 0, 0, 1], [0, 0, 1, 1, 1],[1, 1, 0, 1, 0]])
A_sparse = sparse.csr_matrix(A)
similarities = cosine_similarity(A_sparse)
print('pairwise dense output:\n {}\n'.format(similarities))
#also can output sparse matrices
similarities_sparse = cosine_similarity(A_sparse,dense_output=False)
print('pairwise sparse output:\n {}\n'.format(similarities_sparse))
结果:
pairwise dense output:
[[ 1. 0.40824829 0.40824829]
[ 0.40824829 1. 0.33333333]
[ 0.40824829 0.33333333 1. ]]
pairwise sparse output:
(0, 1) 0.408248290464
(0, 2) 0.408248290464
(0, 0) 1.0
(1, 0) 0.408248290464
(1, 2) 0.333333333333
(1, 1) 1.0
(2, 1) 0.333333333333
(2, 0) 0.408248290464
(2, 2) 1.0
如果要按列进行余弦相似度,只需预先转置输入矩阵即可:
If you want column-wise cosine similarities simply transpose your input matrix beforehand:
A_sparse.transpose()