更新时间:2023-12-01 10:28:04
假设接收多个参数的 udf 接收多个列.1"不是一列.
A udf which recieves multiple arguments is assumed to recieve multiple columns. The "1" is not a column.
这意味着您可以执行以下操作之一.要么按照评论中的建议将其设为一列:
This means you can do one of the following. Either make it a column as suggested in the comments:
testdf.withColumn("PaidMonth", change_day(testdf.date, lit(1))).show(1)
lit(1) 是一列
lit(1) is a column of ones
或者让原函数返回一个高阶函数:
or make the original function return a higher order function:
def change_day_(day):
return lambda date: date.replace(day=day)
change_day = sf.udf(change_day_(1), sparktypes.TimestampType())
testdf.withColumn("PaidMonth", change_day(testdf.date)).show(1)
这基本上创建了一个替换为 1 的函数,因此可以接收一个整数.udf 将应用于单个列.
This basically creates a function which replaces with 1 and therefore can recieve an integer. The udf would apply on a single column.