Spark 2.1.1 获取窗口的最后一个元素

更新时间：2023-11-10 09:37:04

根据问题 SPARK-20969，您应该能够通过为您的窗口定义足够的边界来获得预期的结果，如下所示.

According to the issue SPARK-20969, you should be able to get the expected results by defining adequate bounds to your window, as shown below.

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._

val windowSpec = Window
  .partitionBy("name")
  .orderBy("count")
  .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)

sqlContext
  .createDataFrame(
    Seq[(String, Int)](
      ("A", 1),
      ("A", 2),
      ("A", 3),
      ("B", 10),
      ("B", 20),
      ("B", 30)
    ))
  .toDF("name", "count")
  .withColumn("firstCountOfName", first("count").over(windowSpec))
  .withColumn("lastCountOfName", last("count").over(windowSpec))
  .show()

或者，如果您在第一次和最后一次计算的同一列上进行排序，则可以使用无序窗口更改 min 和 max，然后它也应该可以正常工作.

Alternatively, if your are ordering on the same column you are computing first and last, you can change for min and max with a non-ordered window, then it should also work properly.

上一篇 : ：有没有什么办法可以改善这一功能，取代了事件的子字符串用另一个字符串中的malloc分配的字符串？下一篇 : 对具有多个元素的链表进行排序

Spark 2.1.1 获取窗口的最后一个元素

相关阅读

推荐文章