如何遍历pyspark中的每一行dataFrame

更新时间：2023-02-05 10:39:10

You would define a custom function and use map.

def customFunction(row):

   return (row.name, row.age, row.city)

sample2 = sample.rdd.map(customFunction)

sample2 = sample.rdd.map(lambda x: (x.name, x.age, x.city))

The custom function would then be applied to every row of the dataframe. Note that sample2 will be a RDD, not a dataframe.

Map is needed if you are going to perform more complex computations. If you just need to add a derived column, you can use the withColumn, with returns a dataframe.

sample3 = sample.withColumn('age2', sample.age + 2)

上一篇 : ：Django 1.7.1：子模型，删除对父模型的所有引用，并使子模型从祖父继承下一篇 : 获取R中每个组的最后一行

如何遍历pyspark中的每一行dataFrame

相关阅读

技术问答最新文章