且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Spark数据框分组,排序和选择一组列的顶部行

更新时间:2023-11-18 23:09:40

您可以将窗口函数与row_number一起使用:

You can use window functions with row_number:

import org.apache.spark.sql.functions.row_number
import org.apache.spark.sql.expressions.Window

val w = Window.partitionBy($"user_id")
val rankAsc = row_number().over(w.orderBy($"weight")).alias("rank_asc")
val rankDesc = row_number().over(w.orderBy($"weight".desc)).alias("rank_desc")

df.select($"*", rankAsc, rankDesc).filter($"rank_asc" <= 2 || $"rank_desc" <= 2)

在Spark 1.5.0中,您可以使用rowNumber代替row_number.

In Spark 1.5.0 you can use rowNumber instead of row_number.