更新时间:2022-03-14 21:51:19
import pyspark.sql.functions as f
df1 = sc.parallelize([[121],[122],[123]]).toDF(["index"])
df2 = sc.parallelize([[2.4899928731985597,-0.19775025821959014],[1.029654847161142,1.4878188087911541],
[-2.253992428312965,0.29853121635739804]]).toDF(["fact1","fact2"])
# since there is no common column between these two dataframes add row_index so that it can be joined
df1=df1.withColumn('row_index', f.monotonically_increasing_id())
df2=df2.withColumn('row_index', f.monotonically_increasing_id())
df2 = df2.join(df1, on=["row_index"]).sort("row_index").drop("row_index")
df2.show()
别忘了告诉我们它是否解决了您的问题:)