更新时间:2023-11-18 20:59:52
这是另一个简单的示例:
Here is another simple example:
val ds = sc.parallelize(Seq((0, "Lorem ipsum dolor", 1.0, Array("prp1", "prp2", "prp3"))))
使用flatMap展开数组的另一种方法.
Alternative way of exploding arrays using flatMaps.
ds.flatMap { t =>
t._4.map { prp =>
(t._1, t._2, t._3, prp) }}.collect.foreach(println)
结果:
(0,Lorem ipsum dolor,1.0,prp1)
(0,Lorem ipsum dolor,1.0,prp2)
(0,Lorem ipsum dolor,1.0,prp3)
尝试过使用数据集,但不确定是否是***的数据处理方式.
Tried with your dataset but not sure if its the optimal way of doing it.
df1.show(false)
+---+---+------------------------------------------------+
|A |B |C |
+---+---+------------------------------------------------+
|a |1 |[[a, b, c, 0], [a1, b1, c1, 1], [a2, b2, c2, 2]]|
|b |2 |[[a, b, c, 0]] |
+---+---+------------------------------------------------+
df1.rdd.flatMap { t:Row => t.getSeq(2).map { row: Row => (t.getString(0),t.getString(1),row)}}
.map {
case (col1: String,col2: String, col3: Row) => (col1, col2,col3.getString(0),col3.getString(1),col3.getString(2),col3.getInt(3))
}.collect.foreach(println)
结果:
(a,1,a,b,c,0)
(a,1,a1,b1,c1,1)
(a,1,a2,b2,c2,2)
(b,2,a,b,c,0)
希望这会有所帮助!!
Hope this helps!!