更新时间:2022-12-09 13:35:55
monotonically_increasing_id()
是 increasing 和 唯一但不是连续.
monotonically_increasing_id()
is increasing and unique but not consecutive.
您可以通过转换为 rdd
并为两个 dataframe
使用相同模式重建 Dataframe 来使用 zipWithIndex
.
You can use zipWithIndex
by converting to rdd
and reconstructing Dataframe with the same schema for both dataframe
.
import spark.implicits._
val df1 = Seq(
("karti", "9685684551", 24),
("raja", "8595456552", 22)
).toDF("Customer_name", "Customer_phone", "Customer_age")
val df2 = Seq(
("watch", 1),
("cattoy", 2)
).toDF("Order_name", "Order_ID")
val df11 = spark.sqlContext.createDataFrame(
df1.rdd.zipWithIndex.map {
case (row, index) => Row.fromSeq(row.toSeq :+ index)
},
// Create schema for index column
StructType(df1.schema.fields :+ StructField("index", LongType, false))
)
val df22 = spark.sqlContext.createDataFrame(
df2.rdd.zipWithIndex.map {
case (row, index) => Row.fromSeq(row.toSeq :+ index)
},
// Create schema for index column
StructType(df2.schema.fields :+ StructField("index", LongType, false))
)
现在加入最终的数据帧
df11.join(df22, Seq("index")).drop("index")
输出:
+-------------+--------------+------------+----------+--------+
|Customer_name|Customer_phone|Customer_age|Order_name|Order_ID|
+-------------+--------------+------------+----------+--------+
|karti |9685684551 |24 |watch |1 |
|raja |8595456552 |22 |cattoy |2 |
+-------------+--------------+------------+----------+--------+