更新时间:2023-11-18 14:54:04
当你将 DataFrame 转换为 RDD 时,你会得到一个 RDD[Row]
,所以当你使用 map
时code>,您的函数接收一个 Row
作为参数.因此,必须使用Row
方法来访问其成员(注意索引从0开始):
When you convert a DataFrame to RDD, you get an RDD[Row]
, so when you use map
, your function receives a Row
as parameter. Therefore, you must use the Row
methods to access its members (note that the index starts from 0):
df.rdd.map {
row: Row => (row.getString(1) + "_" + row.getString(2), row)
}.take(5)
您可以在 Spark scaladoc.
You can view more examples and check all methods available for Row
objects in the Spark scaladoc.
我不知道您执行此操作的原因,但是为了连接 DataFrame 的 String 列,您可以考虑以下选项:
I don't know the reason why you are doing this operation, but for concatenating String columns of a DataFrame you may consider the following option:
import org.apache.spark.sql.functions._
val newDF = df.withColumn("concat", concat(df("col2"), lit("_"), df("col3")))