Tuesday, April 23, 2019

How to customize FileOutputCommitter for MapReduce job by overwriting Output Format Class

Goal:

This article explains how to customize FileOutputCommitter for MapReduce job by overwriting Output Format Class.
This can be used to change the output directory, customize the file name etc.

Env:

MapR 6.1
Hadoop 2.7.0

Solution:

Here is the sample code by modifying the wordcount sample MapReduce job.
In this example, we just changes the job output directory to add a sub-directory named "mysubdir".
This sample code only have 2 java files:
  • WordCount.java -- Job driver class.
  • myOutputFormat.java -- Output Format Class defined by us
1. WordCount.java
Most of the code is the same as a sample WordCount job, and we only overwrite the Output Format Class:
job.setOutputFormatClass(myOutputFormat.class);

2. myOutputFormat.java
We customized the method "getOutputCommitter" as below:
  public synchronized OutputCommitter getOutputCommitter(TaskAttemptContext context)
    throws IOException
  {
    if (this.myCommitter == null)
    {
      Path output = new Path(getOutputDir(context));
      this.myCommitter = new FileOutputCommitter(output, context);
    }
    return this.myCommitter;
  }

  protected static String getOutputDir(TaskAttemptContext context)
  {
    int taskID = context.getTaskAttemptID().getTaskID().getId();
    String taskType = context.getTaskAttemptID().getTaskID().getTaskType().toString();
    System.err.println("MyDebug: taskattempt id is: " + taskID + " and tasktype is: " + taskType);
    String outputBaseDir = getOutputPath(context).toString() + "/mysubdir";
    return outputBaseDir;
  }

This is a simple demo to change the job output directory.
You can also customize the file name or compression types by overriding getRecordWriter method as well.

After running this MapReduce job, the output files will be put in:
# hadoop fs -ls -R /hao/wordfinal
drwxr-xr-x   - mapr mapr          4 2019-04-23 13:29 /hao/wordfinal/mysubdir
-rwxr-xr-x   3 mapr mapr          0 2019-04-23 13:29 /hao/wordfinal/mysubdir/_SUCCESS
-rwxr-xr-x   3 mapr mapr          0 2019-04-23 13:29 /hao/wordfinal/mysubdir/part-r-00000
-rwxr-xr-x   3 mapr mapr          0 2019-04-23 13:29 /hao/wordfinal/mysubdir/part-r-00001
-rwxr-xr-x   3 mapr mapr         51 2019-04-23 13:29 /hao/wordfinal/mysubdir/part-r-00002


No comments:

Post a Comment

Popular Posts