更新时间:2023-11-18 18:45:10
可以在sparksql中使用窗口函数来实现这一点。 p>
You can use window function in sparksql to achieve this.
df.registerTempTable("x")
sqlContext.sql("SELECT a, b,c,d FROM( SELECT *, ROW_NUMBER()OVER(PARTITION BY a ORDER BY b DESC) rn FROM x) y WHERE rn = 1").collect
这将实现你所需要的。
阅读更多关于Window函数的支持 https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
This will achieve what you need. Read more about Window function suupport https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html