且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在Spark SQL的DataFrame中更改列类型?

更新时间:2021-10-03 06:46:42

最新版本

从spark 2.x开始,您可以使用.withColumn.在此处检查文档:

Newest version

Since spark 2.x you can use .withColumn. Check the docs here:

从Spark版本1.4开始,您可以在列上应用具有DataType的强制转换方法:

Since Spark version 1.4 you can apply the cast method with DataType on the column:

import org.apache.spark.sql.types.IntegerType
val df2 = df.withColumn("yearTmp", df.year.cast(IntegerType))
    .drop("year")
    .withColumnRenamed("yearTmp", "year")

如果您使用的是SQL表达式,也可以执行以下操作:

If you are using sql expressions you can also do:

val df2 = df.selectExpr("cast(year as int) year", 
                        "make", 
                        "model", 
                        "comment", 
                        "blank")

有关更多信息,请检查文档: http://spark.apache. org/docs/1.6.0/api/scala/#org.apache.spark.sql.DataFrame

For more info check the docs: http://spark.apache.org/docs/1.6.0/api/scala/#org.apache.spark.sql.DataFrame