使用ReduceByKey对值列表进行分组

更新时间：2023-11-26 11:03:22

使用aggregateByKey:

 sc.parallelize(Array(("red", "zero"), ("yellow", "one"), ("red", "two")))
.aggregateByKey(ListBuffer.empty[String])(
        (numList, num) => {numList += num; numList},
         (numList1, numList2) => {numList1.appendAll(numList2); numList1})
.mapValues(_.toList)
.collect()

scala> Array[(String, List[String])] = Array((yellow,List(one)), (red,List(zero, two)))

有关aggregateByKey的详细信息，请参见此答案，此链接，以了解使用可变数据集ListBuffer的背后原理.

See this answer for the details on aggregateByKey, this link for the rationale behind using a mutable dataset ListBuffer.

Is there a way to achieve the same result using reduceByKey?

以上实际上是更糟糕的性能，请查看@ zero323的评论以获取详细信息.

The above is actually worse in performance, please see comments by @zero323 for the details.

上一篇 : ：使用 Laravel Eloquent 获取最后插入的 ID下一篇 : 如何在WebView中打开字符串(URL)

使用ReduceByKey对值列表进行分组

相关阅读

推荐文章