更新时间:2023-11-18 23:22:16
如果要sum
一列的所有值,则使用DataFrame
的内部RDD
和reduce
效率更高./p>
If you want to sum
all values of one column, it's more efficient to use DataFrame
's internal RDD
and reduce
.
import sqlContext.implicits._
import org.apache.spark.sql.functions._
val df = sc.parallelize(Array(10,2,3,4)).toDF("steps")
df.select(col("steps")).rdd.map(_(0).asInstanceOf[Int]).reduce(_+_)
//res1 Int = 19