更新时间:2023-08-19 17:18:52
首先,让我们简化 if 语句:
First thing, lets simplify that if statement:
if(k == "yes" && i.nonEmpty)
if(maxDate - targetDate < 0)
if (j.isEmpty) "pending"
else "approved"
else "expired"
else ""
现在有两种主要方法可以实现这一点
Now there are 2 main ways to accomplish this
coalesce
、when
、otherwise
现在,由于条件的复杂性,执行第 2 项会相当棘手.使用自定义 UDF 应该可以满足您的需求.
Now due to the complexity of your conditions, it will be rather tricky to do number 2. Using a custom UDF should suit your needs.
def getState(i: String, j: String, k: String, maxDate: Long, targetDate: Long): String =
if(k == "yes" && i.nonEmpty)
if(maxDate - targetDate < 0)
if (j.isEmpty) "pending"
else "approved"
else "expired"
else ""
val stateUdf = udf(getState _)
df.withColumn("state", stateUdf($"i",$"j",$"k",lit(0),lit(0)))
只需将 lit(0) 和 lit(0) 更改为您的日期代码,这应该对您有用.
Just change lit(0) and lit(0) to your date code, and this should work for you.
如果您发现性能问题,您可以切换到使用 coalesce
、otherwise
和 when
,它们看起来像这样:
If you notice performance issues, you can switch to using coalesce
, otherwise
, and when
, which would look something like this:
val isApproved = df.withColumn("state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) < 0) && $"j" =!= "", "approved").otherwise(null))
val isPending = isApproved.withColumn("state", coalesce($"state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) < 0) && $"j" === "", "pending").otherwise(null)))
val isExpired = isPending.withColumn("state", coalesce($"state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) >= 0), "expired").otherwise(null)))
val finalDf = isExpired.withColumn("state", coalesce($"state", lit("")))
我过去在大型输入源中使用过自定义 udf 没有问题,并且自定义 udf 可以产生更具可读性的代码,尤其是在这种情况下.
I've used custom udf's in the past with large input sources without issues, and custom udfs can lead to much more readable code, especially in this case.