且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

登录 MapReduce 作业的标准做法

更新时间:2023-11-17 11:08:52

您可以使用 log4j,它是 hadoop 使用的默认日志框架.因此,从您的 MapReduce 应用程序中,您可以执行以下操作:

You could use log4j which is the default logging framework that hadoop uses. So, from your MapReduce application you could do something like this:

import org.apache.log4j.Logger;
// other imports omitted

public class SampleMapper extends Mapper<LongWritable, Text, Text, Text> {
    private Logger logger = Logger.getLogger(SampleMapper.class);

    @Override
    protected void setup(Context context) {
        logger.info("Initializing NoSQL Connection.")
        try {
            // logic for connecting to NoSQL - ommitted
        } catch (Exception ex) {
            logger.error(ex.getMessage());
        }
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // mapper code ommitted
    }
}        

此示例代码将使用 log4j 记录器将事件记录到继承的 Mapper 记录器.所有日志事件都将记录到它们各自的任务日志中.您可以从 JobTracker(MRv1)/ResourceManager(MRv2) 网页访问任务日志.

This sample code will user log4j logger to log events to the inherited Mapper logger. All the log events will be logged to their respective task log's. You could visit the task logs from either JobTracker(MRv1)/ResourceManager(MRv2) webpage.

如果您使用的是 yarn,您可以使用以下命令从命令行访问应用程序日志:

If you are using yarn you could access the application logs from command line using the following command:

yarn logs -applicationId <application_id>

如果您使用的是 ma​​preduce v1,则无法从命令行进行单点访问;因此,您必须登录每个 TaskTracker 并查看配置的路径,通常是 ${hadoop.log.dir} 中指定的 /var/log/hadoop/userlogs/attempt_<job_id>/syslog/userlogs 包含 log4j 输出.

While if you are using mapreduce v1, there is no single point of access from command line; hence you have to log into each TaskTracker and look in the configured path generally /var/log/hadoop/userlogs/attempt_<job_id>/syslog specified in ${hadoop.log.dir}/userlogs contains log4j output.