且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在spark scala中将json字符串解析为不同的列?

更新时间:2023-01-16 10:50:58

首先需要提取json模式:

First you need to extract the json schema:

  val schema = schema_of_json(lit(df.select($"activeGroup").as[String].first))

一旦你得到它,你就可以将你的 activegroup 列,它是一个 String 到 json (from_json),然后 explode 它.

Once you got it, you can convert your activegroup column, which is a String to json (from_json), and then explode it.

一旦该列是一个 json,您就可以使用 $"columnName.field"

Once the column is a json, you can extract it's values with $"columnName.field"

  val dfresult = df.withColumn("jsonColumn", explode(
                                      from_json($"activegroup", schema)))
                   .select($"id", $"name",
                           $"jsonColumn.groupId" as "groupId", 
                           $"jsonColumn.role" as "role", 
                           $"jsonColumn.status" as "status")

如果你想提取整个 json 并且元素名称对你来说没问题,你可以使用 * 来做:

If you want to extract the whole json and the element names are ok to you you can use the * to do it:

val dfresult = df.withColumn("jsonColumn", explode(
                               from_json($"activegroup", schema)))
            .select($"id", $"name", $"jsonColumn.*")

结果

+---+----+-------+-----+------+
| id|name|groupId| role|status|
+---+----+-------+-----+------+
|  1| abc|     5d|admin|     A|
|  1| abc|     58|admin|     A|
+---+----+-------+-----+------+