且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在DataFrame中合并多个特征向量?

更新时间:2023-02-26 14:36:25

您可以使用 VectorAssembler:

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.DataFrame

val df: DataFrame = ???

val assembler = new VectorAssembler()
  .setInputCols(Array("text_features", "color_features", "type_features"))
  .setOutputCol("features")

val transformed = assembler.transform(df)

有关 PySpark 示例,请参阅:在 PySpark 中编码和组合多个功能

For PySpark example see: Encode and assemble multiple features in PySpark