且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将数据框的所有列转换为数字Spark Scala?

更新时间:2023-11-18 23:19:04

以此DataFrame为例:

Given this DataFrame as example:

val df = sqlContext.createDataFrame(Seq(("0", 0),("1", 1),("2", 0))).toDF("id", "c0")

具有架构:

StructType(
    StructField(id,StringType,true), 
    StructField(c0,IntegerType,false))

您可以通过 .columns 函数在DF列上循环:

You can loop over DF columns by .columns functions:

val castedDF = df.columns.foldLeft(df)((current, c) => current.withColumn(c, col(c).cast("float")))

所以新的DF模式如下:

So the new DF schema looks like:

StructType(
    StructField(id,FloatType,true), 
    StructField(c0,FloatType,false))

如果您想从投射中排除某些列,则可以执行以下操作(假设我们要排除 id 列):

If you wanna exclude some columns from casting, you could do something like (supposing we want to exclude the column id):

val exclude = Array("id")

val someCastedDF = (df.columns.toBuffer --= exclude).foldLeft(df)((current, c) =>
                                              current.withColumn(c, col(c).cast("float")))

其中 exclude 是我们要从转换中排除的所有列的数组.

where exclude is an Array of all columns we want to exclude from casting.

因此,此新DF的架构为:

So the schema of this new DF is:

StructType(
    StructField(id,StringType,true), 
    StructField(c0,FloatType,false))

请注意,这也许不是***的解决方案,但它可能是一个起点.

Please notice that maybe this is not the best solution to do it but it can be a starting point.