且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Apache Flink 输出到每个 GroupedDataSet 的 csv 文件

更新时间:2023-10-18 14:22:46

您需要的是分桶接收器,但目前仅支持流式作业,不支持批处理.Flink 1.12 统一了batch &流媒体,所以理论上这可能对你有用.我为批处理作业实现了自己的分桶接收器,但最近版本的 Hadoop 似乎存在一些问题,我需要对其进行调试.

What you need is a bucketing sink, but that's currently only supported for streaming jobs, not batch. Flink 1.12 has unified batch & streaming, so in theory that might work for you. I implemented my own bucketing sink for batch jobs, but it seems to have some issues with recent versions of Hadoop, which I need to debug.