且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

pyspark 更改日期时间列中的日期

更新时间:2023-12-01 10:28:04

假设接收多个参数的 udf 接收多个.1"不是一列.

A udf which recieves multiple arguments is assumed to recieve multiple columns. The "1" is not a column.

这意味着您可以执行以下操作之一.要么按照评论中的建议将其设为一列:

This means you can do one of the following. Either make it a column as suggested in the comments:

testdf.withColumn("PaidMonth", change_day(testdf.date, lit(1))).show(1)

lit(1) 是一列

lit(1) is a column of ones

或者让原函数返回一个高阶函数:

or make the original function return a higher order function:

def change_day_(day):
    return lambda date: date.replace(day=day)

change_day = sf.udf(change_day_(1), sparktypes.TimestampType())
testdf.withColumn("PaidMonth", change_day(testdf.date)).show(1)

这基本上创建了一个替换为 1 的函数,因此可以接收一个整数.udf 将应用于单个列.

This basically creates a function which replaces with 1 and therefore can recieve an integer. The udf would apply on a single column.