且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

通过使用数据帧中多个其他列的值向数据帧添加新列 - spark/scala

更新时间:2023-08-19 17:18:52

首先,让我们简化 if 语句:

First thing, lets simplify that if statement:

if(k == "yes" && i.nonEmpty)
  if(maxDate - targetDate < 0)
    if (j.isEmpty) "pending" 
    else "approved"
  else "expired"
else ""

现在有两种主要方法可以实现这一点

Now there are 2 main ways to accomplish this

  1. 使用自定义 UDF
  2. 使用 spark 内置函数:coalescewhenotherwise

自定义 UDF

现在,由于条件的复杂性,执行第 2 项会相当棘手.使用自定义 UDF 应该可以满足您的需求.

Custom UDF

Now due to the complexity of your conditions, it will be rather tricky to do number 2. Using a custom UDF should suit your needs.

def getState(i: String, j: String, k: String, maxDate: Long, targetDate: Long): String =  
  if(k == "yes" && i.nonEmpty)
    if(maxDate - targetDate < 0)
      if (j.isEmpty) "pending" 
      else "approved"
    else "expired"
  else ""

val stateUdf = udf(getState _)
df.withColumn("state", stateUdf($"i",$"j",$"k",lit(0),lit(0)))

只需将 lit(0) 和 lit(0) 更改为您的日期代码,这应该对您有用.

Just change lit(0) and lit(0) to your date code, and this should work for you.

如果您发现性能问题,您可以切换到使用 coalesceotherwisewhen,它们看起来像这样:

If you notice performance issues, you can switch to using coalesce, otherwise, and when, which would look something like this:

val isApproved = df.withColumn("state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) < 0) && $"j" =!= "", "approved").otherwise(null))
val isPending = isApproved.withColumn("state", coalesce($"state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) < 0) && $"j" === "", "pending").otherwise(null)))
val isExpired = isPending.withColumn("state", coalesce($"state", when($"k" === "yes" && $"i" =!= "" && (lit(max_date) - lit(target_date) >= 0), "expired").otherwise(null)))
val finalDf = isExpired.withColumn("state", coalesce($"state", lit("")))

我过去在大型输入源中使用过自定义 udf 没有问题,并且自定义 udf 可以产生更具可读性的代码,尤其是在这种情况下.

I've used custom udf's in the past with large input sources without issues, and custom udfs can lead to much more readable code, especially in this case.