且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将RDD转换为pyspark中的DataFrame(来自rdd的第一个元素的列)

更新时间:2022-02-05 22:46:30

您将必须从RDD中删除标头.一种方法是考虑以下rdd变量:

You will have to remove the header from your RDD. One way to do it is the following considering your rdd variable :

>>> header = rdd.first()
>>> header
# ['mailid', 'age', 'address']
>>> data = rdd.filter(lambda row : row != header).toDF(header)
>>> data.show()
# +------+---+-------+
# |mailid|age|address|
# +------+---+-------+
# | satya| 23| Mumbai|
# |   abc| 27|    Goa|
# +------+---+-------+