且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从scala到python的PySpark RDD稀疏矩阵乘法

更新时间:2021-10-28 21:51:22

Python 3.x的实现

  1. 由于在Python 3中, lambda 函数中没有元组解包,因此我们必须引用
  1. Since in Python 3 there is no tuple unpacking in lambda functions, we have to reference the MatrixEntry by a single variable e.
  2. Also, MatrixEntry is not indexable so we must call the individual properties i, j and value.

def coordinateMatrixMultiply(leftmatrix, rightmatrix):
    left  =  leftmatrix.entries.map(lambda e: (e.j, (e.i, e.value)))
    right = rightmatrix.entries.map(lambda e: (e.i, (e.j, e.value)))
    productEntries = left \
        .join(right) \
        .map(lambda e: ((e[1][0][0], e[1][1][0]), (e[1][0][1]*e[1][1][1]))) \
        .reduceByKey(lambda x,y: x+y) \
        .map(lambda e: (*e[0], e[1]))
    return productEntries