且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

带有数据流的 Apache Beam - 从 BigQuery 读取时出现空指针

更新时间:2023-11-21 23:11:52

我最终在 google issuetracker 中添加了错误.在与 Google 员工进行更长时间的对话并进行调查后,结果证明将模板与从 BigQuery 读取的数据流批处理作业一起使用是没有意义的,因为您只能执行一次.

I ended up adding bug in google issuetracker. After longer conversation with google employee and their investigation it turned out that it doesn't make sense to use templates with dataflow batch jobs that read from BigQuery, because you can only execute them once.

引用:对于 BigQuery 批处理管道,模板只能执行一次,因为 BigQuery 作业 ID 在模板创建时设置.此限制将在 SDK 2 的未来版本中删除,但当我不能说.创建模板:https://cloud.google.com/dataflow/docs/templates/creating-templates#pipeline-io-and-runtime-parameters"

To quote: "for BigQuery batch pipelines, templates can only be executed once, as the BigQuery job ID is set at template creation time. This restriction will be removed in a future release for the SDK 2, but when I cannot say. Creating Templates: https://cloud.google.com/dataflow/docs/templates/creating-templates#pipeline-io-and-runtime-parameters"

如果错误比 NullpointerException 更清楚,那就更好了.

It still would be good if the error would be more clear than NullpointerException.

无论如何,我希望对未来的人有所帮助.

Anyway I hope that helps someone in the future.

如果有人对整个对话感兴趣,则问题如下:https://issuetracker.google.com/issues/63124894

Here is the issue if someone is interested in whole conversation: https://issuetracker.google.com/issues/63124894