且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何RDD [(字符串,字符串)转换成RDD [阵列[字符串]?

更新时间:2023-02-03 08:29:43

如果你有一个 RDD [(字符串,字符串)] ,您可以访问的第一个元组领域通过调用第一个元组

  VAL firstTupleField:字符串= myRDD.first()._ 1

如果你想转换一个 RDD [(字符串,字符串)] RDD [数组[字符串]] 你可以做以下

  VAL arrayRDD:RDD [阵列[字符串] = myRDD.map(X =>阵列(x._1,x._2))

您也可以使用部分功能,以解构的元组:

  VAL arrayRDD:RDD [阵列[字符串] = {myRDD.map案(A,B)=>阵列(A,B)}

I am trying to append filename to each record in the file. I thought if the RDD is Array it would have been easy for me to do it.

Some help with converting RDD type or solving this problem would be much appreciated!

In (String, String) type

scala> myRDD.first()(1)    
scala><console>:24: error: (String, String) does not take parametersmyRDD.first()(1)  

In Array(string)

scala> myRDD.first()(1)    
scala> res1: String = abcdefgh

My function:

def appendKeyToValue(x: Array[Array[String]){
    for (i<-0 to (x.length - 1)) {
        var key = x(i)(0)
        val pattern = new Regex("\\.")
        val key2 = pattern replaceAllIn(key1,"|")
        var tempvalue = x(i)(1)
        val finalval = tempvalue.split("\n")
        for (ab <-0 to (finalval.length -1)){
            val result = (I am trying to append filename to each record in the filekey2+"|"+finalval(ab))
            }  
        }
}

If you have a RDD[(String, String)], you can access the first tuple field of the first tuple by calling

val firstTupleField: String = myRDD.first()._1

If you want to convert a RDD[(String, String)] into a RDD[Array[String]] you can do the following

val arrayRDD: RDD[Array[String]] = myRDD.map(x => Array(x._1, x._2))

You may also employ a partial function to destructure the tuples:

val arrayRDD: RDD[Array[String]] = myRDD.map { case (a,b) => Array(a, b) }