更新时间:2023-02-05 10:39:10
You would define a custom function and use map.
def customFunction(row):
return (row.name, row.age, row.city)
sample2 = sample.rdd.map(customFunction)
or
sample2 = sample.rdd.map(lambda x: (x.name, x.age, x.city))
The custom function would then be applied to every row of the dataframe. Note that sample2 will be a RDD
, not a dataframe.
Map is needed if you are going to perform more complex computations. If you just need to add a derived column, you can use the withColumn
, with returns a dataframe.
sample3 = sample.withColumn('age2', sample.age + 2)