在没有公共列的情况下连接两个数据框

更新时间：2022-12-09 13:35:55

monotonically_increasing_id() 是 increasing 和唯一但不是连续.

monotonically_increasing_id() is increasing and unique but not consecutive.

您可以通过转换为 rdd 并为两个 dataframe 使用相同模式重建 Dataframe 来使用 zipWithIndex.

You can use zipWithIndex by converting to rdd and reconstructing Dataframe with the same schema for both dataframe.

import spark.implicits._


val df1 = Seq(
  ("karti", "9685684551", 24),
  ("raja", "8595456552", 22)
).toDF("Customer_name", "Customer_phone", "Customer_age")


val df2 = Seq(
  ("watch", 1),
  ("cattoy", 2)
).toDF("Order_name", "Order_ID")

val df11 = spark.sqlContext.createDataFrame(
  df1.rdd.zipWithIndex.map {
    case (row, index) => Row.fromSeq(row.toSeq :+ index)
  },
  // Create schema for index column
  StructType(df1.schema.fields :+ StructField("index", LongType, false))
)


val df22 = spark.sqlContext.createDataFrame(
  df2.rdd.zipWithIndex.map {
    case (row, index) => Row.fromSeq(row.toSeq :+ index)
  },
  // Create schema for index column
  StructType(df2.schema.fields :+ StructField("index", LongType, false))
)

现在加入最终的数据帧

df11.join(df22, Seq("index")).drop("index")

输出:

+-------------+--------------+------------+----------+--------+
|Customer_name|Customer_phone|Customer_age|Order_name|Order_ID|
+-------------+--------------+------------+----------+--------+
|karti        |9685684551    |24          |watch     |1       |
|raja         |8595456552    |22          |cattoy    |2       |
+-------------+--------------+------------+----------+--------+

上一篇 : ：在DataFrame中查找两列之间的时差下一篇 : 基于文本的基本公式计算器功能/类

在没有公共列的情况下连接两个数据框

相关阅读

技术问答最新文章