且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用 ReduceByKey 对值列表进行分组

更新时间:2023-11-26 10:49:04

Use aggregateByKey:

 sc.parallelize(Array(("red", "zero"), ("yellow", "one"), ("red", "two")))
.aggregateByKey(ListBuffer.empty[String])(
        (numList, num) => {numList += num; numList},
         (numList1, numList2) => {numList1.appendAll(numList2); numList1})
.mapValues(_.toList)
.collect()

scala> Array[(String, List[String])] = Array((yellow,List(one)), (red,List(zero, two)))

请参阅此答案,了解关于 aggregateByKey 的详细信息此链接 了解使用可变数据集的基本原理ListBuffer.

See this answer for the details on aggregateByKey, this link for the rationale behind using a mutable dataset ListBuffer.

有没有办法使用reduceByKey来达到同样的效果?

上面的实际上性能更差,详情请看@zero323的评论.

The above is actually worse in performance, please see comments by @zero323 for the details.