且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

带有流源的查询必须使用 writeStream.start(); 执行;

更新时间:2023-02-18 19:58:50

一般来说,Structured Streaming 不能(目前 - 从 Spark 2.2 开始)用于训练 Spark ML 模型.结构化流媒体不支持某些操作.其中之一是将 Dataset 转换为它的 rdd 表示.特别是 word2Vec 的情况,需要到rdd层面去实现fit.

In general, Structured Streaming cannot (yet - as of Spark 2.2) be used to train Spark ML models. There are some operations that are not supported in Structured Streaming. One of those is to transform a Dataset to its rdd representation. In particular the case of word2Vec, it needs to go to the rdd level to implement fit.

尽管如此,还是可以在静态数据集上训练模型并将预测应用于流数据.transform 操作可用于流式 Dataset,如上所示:val result = model.transform(removestopdf)

Nevertheless, it's possible to train the model on a static dataset and apply the predictions on the streaming data. The transform operation is usable on a streaming Dataset, like above: val result = model.transform(removestopdf)

简而言之,我们需要在静态数据集上拟合模型.生成的 transformer 可以应用到流式Dataset.

In a nutshell, we need to fit the model on a static dataset. The resulting transformer can be applied to a streaming Dataset.