如何遍历Spark数据框

更新时间：2023-11-18 23:53:16

如果数据很小("但df没那么大")，我将使用Scala集合进行收集和处理.如果类型如下所示:

If data is small ("but the df is not that big") I'd just collect and process using Scala collections. If types are as shown below:

df.printSchema
root
 |-- time: integer (nullable = false)
 |-- id: integer (nullable = false)
 |-- direction: boolean (nullable = false)

您可以收集:

val data = df.as[(Int, Int, Boolean)].collect.toSeq

和scanLeft:

val result = data.scanLeft((-1, Set[Int]())){ 
  case ((_, acc), (time, value, true)) => (time, acc + value)
  case ((_, acc), (time, value, false))  => (time, acc - value)
}.tail

上一篇 : ：机器人的WebView抛出SQLiteException下一篇 : 间质性添加是没有得到在机器人装

如何遍历Spark数据框

相关阅读

推荐文章