带有流源的查询必须使用 writeStream.start(); 执行；

更新时间：2023-02-18 19:58:50

一般来说，Structured Streaming 不能(目前 - 从 Spark 2.2 开始)用于训练 Spark ML 模型.结构化流媒体不支持某些操作.其中之一是将 Dataset 转换为它的 rdd 表示.特别是 word2Vec 的情况，需要到rdd层面去实现fit.

In general, Structured Streaming cannot (yet - as of Spark 2.2) be used to train Spark ML models. There are some operations that are not supported in Structured Streaming. One of those is to transform a Dataset to its rdd representation. In particular the case of word2Vec, it needs to go to the rdd level to implement fit.

尽管如此，还是可以在静态数据集上训练模型并将预测应用于流数据.transform 操作可用于流式 Dataset，如上所示:val result = model.transform(removestopdf)

Nevertheless, it's possible to train the model on a static dataset and apply the predictions on the streaming data. The transform operation is usable on a streaming Dataset, like above: val result = model.transform(removestopdf)

简而言之，我们需要在静态数据集上拟合模型.生成的 transformer 可以应用到流式Dataset.

In a nutshell, we need to fit the model on a static dataset. The resulting transformer can be applied to a streaming Dataset.

上一篇 : ：PHP mysqli连接函数下一篇 : mysqli_begin_transaction()和mysqli_autocommit有什么区别

带有流源的查询必须使用 writeStream.start(); 执行；

相关阅读

技术问答最新文章