且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Apache Beam TextIO.使用行号读取

更新时间:2023-11-21 23:21:04

您可以使用 ReadableFile 代码> .

You can use FileIO to read the file manually, where you can determine the line number when you read from the ReadableFile.

一个简单的解决方案如下所示:

A simple solution can look as follows:

p
    .apply(FileIO.match().filepattern("/file.csv"))
    .apply(FileIO.readMatches())
    .apply(FlatMapElements
            .into(strings())
            .via((FileIO.ReadableFile f) -> {
                List<String> result = new ArrayList<>();
                try (BufferedReader br = new BufferedReader(Channels.newReader(f.open(), "UTF-8"))) {
                    int lineNr = 1;
                    String line = br.readLine();
                    while (line != null) {
                        result.add(lineNr + "," + line);
                        line = br.readLine();
                        lineNr++;
                    }
                } catch (IOException e) {
                    throw new RuntimeException("Error while reading", e);
                }
                return result;
            }));

上面的解决方案只是将行号添加到每个输入行.

The solution above just prepends the line number to each input line.