更新时间:2023-08-25 07:58:10
首先,让我们简化一下if语句:
First thing, lets simplify that if statement:
if(k == "yes" && i.nonEmpty)
if(maxDate - targetDate < 0)
if (j.isEmpty) "pending"
else "approved"
else "expired"
else ""
现在有两种主要方法可以实现这一目标
Now there are 2 main ways to accomplish this
coalesce
,when
,otherwise
coalesce
, when
, otherwise
现在,由于条件的复杂性,执行第2个操作将非常棘手.使用自定义UDF应该可以满足您的需求.
Now due to the complexity of your conditions, it will be rather tricky to do number 2. Using a custom UDF should suit your needs.
def getState(i: String, j: String, k: String, maxDate: Long, targetDate: Long): String =
if(k == "yes" && i.nonEmpty)
if(maxDate - targetDate < 0)
if (j.isEmpty) "pending"
else "approved"
else "expired"
else ""
val stateUdf = udf(getState _)
df.withColumn("state", stateUdf($"i",$"j",$"k",lit(0),lit(0)))
只需将lit(0)和lit(0)更改为您的日期代码,这对您就可以使用.
Just change lit(0) and lit(0) to your date code, and this should work for you.
如果发现性能问题,则可以切换为使用coalesce
,otherwise
和when
,它们看起来像这样:
If you notice performance issues, you can switch to using coalesce
, otherwise
, and when
, which would look something like this:
val isApproved = df.withColumn("state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) < 0) && $"j" =!= "", "approved").otherwise(null))
val isPending = isApproved.withColumn("state", coalesce($"state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) < 0) && $"j" === "", "pending").otherwise(null)))
val isExpired = isPending.withColumn("state", coalesce($"state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) >= 0), "expired").otherwise(null)))
val finalDf = isExpired.withColumn("state", coalesce($"state", lit("")))
过去,我在大型输入源中使用自定义udf时没有问题,自定义udfs可以导致代码更具可读性,尤其是在这种情况下.
I've used custom udf's in the past with large input sources without issues, and custom udfs can lead to much more readable code, especially in this case.