更新时间:2023-11-18 23:53:16
如果数据很小("但df没那么大"),我将使用Scala集合进行收集和处理.如果类型如下所示:
If data is small ("but the df is not that big") I'd just collect and process using Scala collections. If types are as shown below:
df.printSchema
root
|-- time: integer (nullable = false)
|-- id: integer (nullable = false)
|-- direction: boolean (nullable = false)
您可以收集:
val data = df.as[(Int, Int, Boolean)].collect.toSeq
和scanLeft
:
val result = data.scanLeft((-1, Set[Int]())){
case ((_, acc), (time, value, true)) => (time, acc + value)
case ((_, acc), (time, value, false)) => (time, acc - value)
}.tail