且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用ReduceByKey对值列表进行分组

更新时间:2023-11-26 11:03:22

使用aggregateByKey:

 sc.parallelize(Array(("red", "zero"), ("yellow", "one"), ("red", "two")))
.aggregateByKey(ListBuffer.empty[String])(
        (numList, num) => {numList += num; numList},
         (numList1, numList2) => {numList1.appendAll(numList2); numList1})
.mapValues(_.toList)
.collect()

scala> Array[(String, List[String])] = Array((yellow,List(one)), (red,List(zero, two)))

有关aggregateByKey的详细信息,请参见此答案此链接,以了解使用可变数据集ListBuffer的背后原理.

See this answer for the details on aggregateByKey, this link for the rationale behind using a mutable dataset ListBuffer.

Is there a way to achieve the same result using reduceByKey?

以上实际上是更糟糕的性能,请查看@ zero323的评论以获取详细信息.

The above is actually worse in performance, please see comments by @zero323 for the details.