且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从pyspark数据框中检索一列并将其作为新列插入现有pyspark数据框中?

更新时间:2022-03-14 21:51:19

import pyspark.sql.functions as f

df1 = sc.parallelize([[121],[122],[123]]).toDF(["index"])
df2 = sc.parallelize([[2.4899928731985597,-0.19775025821959014],[1.029654847161142,1.4878188087911541],
                        [-2.253992428312965,0.29853121635739804]]).toDF(["fact1","fact2"])

# since there is no common column between these two dataframes add row_index so that it can be joined
df1=df1.withColumn('row_index', f.monotonically_increasing_id())
df2=df2.withColumn('row_index', f.monotonically_increasing_id())

df2 = df2.join(df1, on=["row_index"]).sort("row_index").drop("row_index")
df2.show()



别忘了告诉我们它是否解决了您的问题:)