且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将集合函数某种类型的每一列

更新时间:2023-11-30 19:57:10

您可以先建总前pressions名单

You can first build a list of the aggregate expressions

import org.apache.spark.sql.functions.{col, avg, lit}

val exprs = df.dtypes
  .filter(_._2 == "DoubleType")
  .map(ct => avg(col(ct._1))).toList

和两种模式匹配

exprs match {
  case h::t => df.agg(h, t:_*)
  case _ => sqlContext.emptyDataFrame
}

或使用虚拟列

df.agg(lit(1).alias("_dummy"), exprs: _*).drop("_dummy")

如果你想使用多种功能可以 flatMap 显式:

If you want to use multiple functions you can flatMap either explicitly:

import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.{avg, min, max}

val funs: List[(String => Column)] = List(min, max, avg)

val exprs: Array[Column] = df.dtypes 
   .filter(_._2 == "DoubleType")
   .flatMap(ct => funs.map(fun => fun(ct._1)))

或使用COM prehension:

or using for comprehension:

val exprs: Array[Column] = for {
    cname <-  df.dtypes.filter(_._2 == "DoubleType").map(_._1)
    fun <- funs
} yield fun(cname)

转换 exprs 列表如果你想使用模式匹配的方法。

Convert exprs to List if you want to use pattern match approach.