pyspark 检查 HH:mm:ss 是否在一个范围内

更新时间：2022-06-23 03:07:59

您的条件可以简化为检查 time 列的小时部分是否在 16 和 23 之间.

Your condition can be simplified to checking if the hour part of your time column is between 16 and 23.

您可以使用 pyspark.sql.functions.split 标记 : 字符上的 time 列.提取索引 0 处的令牌以获取小时，并使用 pyspark.sql.Column.between()(包括边界).

You can get the hour by using pyspark.sql.functions.split to tokenize the time column on the : character. Extract the token at index 0 to get the hour, and make the comparison using pyspark.sql.Column.between() (which is inclusive of the bounds).

from pyspark.sql.functions import split
df.where(split("time", ":")[0].between(16, 23)).show()
#+--------+
#|    time|
#+--------+
#|22:20:54|
#|21:46:07|
#+--------+

请注意，即使 split 返回一个字符串，也会隐式转换为 int 以进行 between 比较.

Note that even though split returns a string, there is an implicit conversion to int to do the between comparison.

当然，如果您有更复杂的过滤条件，包括查看分钟或秒，则可以扩展此功能:

Of course, this could be extended if you had more complicated filtering criteria that also involved looking at minutes or seconds:

df.select(
    "*",
    split("time", ":")[0].cast("int").alias("hour"),
    split("time", ":")[1].cast("int").alias("minute"),
    split("time", ":")[2].cast("int").alias("second")
).show()
#+--------+----+------+------+
#|    time|hour|minute|second|
#+--------+----+------+------+
#|08:28:24|   8|    28|    24|
#|22:20:54|  22|    20|    54|
#|12:59:38|  12|    59|    38|
#|21:46:07|  21|    46|     7|
#+--------+----+------+------+

上一篇 : ：如何在Java中使用单独的线程调用方法？下一篇 : 将 RadioButtons 绑定到单个属性?

pyspark 检查 HH:mm:ss 是否在一个范围内

相关阅读

技术问答最新文章