更新时间:2022-02-05 22:46:30
您将必须从RDD
中删除标头.一种方法是考虑以下rdd
变量:
You will have to remove the header from your RDD
. One way to do it is the following considering your rdd
variable :
>>> header = rdd.first()
>>> header
# ['mailid', 'age', 'address']
>>> data = rdd.filter(lambda row : row != header).toDF(header)
>>> data.show()
# +------+---+-------+
# |mailid|age|address|
# +------+---+-------+
# | satya| 23| Mumbai|
# | abc| 27| Goa|
# +------+---+-------+