且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在没有公共列的情况下连接两个数据框

更新时间:2022-12-09 13:35:55

monotonically_increasing_id()increasing唯一但不是连续.

monotonically_increasing_id() is increasing and unique but not consecutive.

您可以通过转换为 rdd 并为两个 dataframe 使用相同模式重建 Dataframe 来使用 zipWithIndex.

You can use zipWithIndex by converting to rdd and reconstructing Dataframe with the same schema for both dataframe.

import spark.implicits._


val df1 = Seq(
  ("karti", "9685684551", 24),
  ("raja", "8595456552", 22)
).toDF("Customer_name", "Customer_phone", "Customer_age")


val df2 = Seq(
  ("watch", 1),
  ("cattoy", 2)
).toDF("Order_name", "Order_ID")

val df11 = spark.sqlContext.createDataFrame(
  df1.rdd.zipWithIndex.map {
    case (row, index) => Row.fromSeq(row.toSeq :+ index)
  },
  // Create schema for index column
  StructType(df1.schema.fields :+ StructField("index", LongType, false))
)


val df22 = spark.sqlContext.createDataFrame(
  df2.rdd.zipWithIndex.map {
    case (row, index) => Row.fromSeq(row.toSeq :+ index)
  },
  // Create schema for index column
  StructType(df2.schema.fields :+ StructField("index", LongType, false))
)

现在加入最终的数据帧

df11.join(df22, Seq("index")).drop("index")

输出:

+-------------+--------------+------------+----------+--------+
|Customer_name|Customer_phone|Customer_age|Order_name|Order_ID|
+-------------+--------------+------------+----------+--------+
|karti        |9685684551    |24          |watch     |1       |
|raja         |8595456552    |22          |cattoy    |2       |
+-------------+--------------+------------+----------+--------+