且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在运行 Hadoop MapReduce 作业时获取文件名/文件数据作为 Map 的键/值输入

更新时间:2022-12-06 10:41:17

在你的 CustomRecordReader 类中有这个代码.

Have this code in your CustomRecordReader class.

private LineRecordReader lineReader;

private String fileName;

public CustomRecordReader(JobConf job, FileSplit split) throws IOException {
    lineReader = new LineRecordReader(job, split);
    fileName = split.getPath().getName();
}

public boolean next(Text key, Text value) throws IOException {
    // get the next line
    if (!lineReader.next(key, value)) {
        return false;
    }    

    key.set(fileName);
    value.set(value);

    return true;
}

public Text createKey() {
    return new Text("");
}

public Text createValue() {
    return new Text("");
}

删除 SPDRecordReader 构造函数(这是一个错误).

Remove SPDRecordReader constructor (It is an error).

并在您的 CustomFileInputFormat 类中包含此代码

And have this code in your CustomFileInputFormat class

public RecordReader<Text, Text> getRecordReader(
  InputSplit input, JobConf job, Reporter reporter)
  throws IOException {

    reporter.setStatus(input.toString());
    return new CustomRecordReader(job, (FileSplit)input);
}