且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在 Spark 窗口函数中以降序使用 orderby()?

更新时间:2023-11-18 15:02:22

orderBy 有两种版本,一种适用于字符串,一种适用于 Column 对象(API).您的代码使用的是第一个版本,该版本不允许更改排序顺序.需要切换到列版本,然后调用desc方法,例如myCol.desc.

There are two versions of orderBy, one that works with strings and one that works with Column objects (API). Your code is using the first version, which does not allow for changing the sort order. You need to switch to the column version and then call the desc method, e.g., myCol.desc.

现在,我们进入 API 设计领域.传递 Column 参数的好处是你有更多的灵活性,例如,你可以使用表达式等.如果你想维护一个接受字符串而不是 的 APIColumn,需要将字符串转换为列.有很多方法可以做到这一点,最简单的方法是使用 org.apache.spark.sql.functions.col(myColName).

Now, we get into API design territory. The advantage of passing Column parameters is that you have a lot more flexibility, e.g., you can use expressions, etc. If you want to maintain an API that takes in a string as opposed to a Column, you need to convert the string to a column. There are a number of ways to do this and the easiest is to use org.apache.spark.sql.functions.col(myColName).

综合起来,我们得到

.orderBy(org.apache.spark.sql.functions.col(top_value).desc)