更新时间:2023-02-18 19:58:50
一般来说,Structured Streaming 不能(目前 - 从 Spark 2.2 开始)用于训练 Spark ML 模型.结构化流媒体不支持某些操作.其中之一是将 Dataset
转换为它的 rdd
表示.特别是 word2Vec
的情况,需要到rdd
层面去实现fit
.
In general, Structured Streaming cannot (yet - as of Spark 2.2) be used to train Spark ML models.
There are some operations that are not supported in Structured Streaming. One of those is to transform a Dataset
to its rdd
representation.
In particular the case of word2Vec
, it needs to go to the rdd
level to implement fit
.
尽管如此,还是可以在静态数据集上训练模型并将预测应用于流数据.transform
操作可用于流式 Dataset
,如上所示:val result = model.transform(removestopdf)
Nevertheless, it's possible to train the model on a static dataset and apply the predictions on the streaming data. The transform
operation is usable on a streaming Dataset
, like above: val result = model.transform(removestopdf)
简而言之,我们需要在静态数据集上拟合模型
.生成的 transformer
可以应用到流式Dataset
.
In a nutshell, we need to fit the model
on a static dataset. The resulting transformer
can be applied to a streaming Dataset
.