Tuesday, April 23, 2019

How to customize FileOutputCommitter for MapReduce job by overwriting Output Format Class

Goal:

This article explains how to customize FileOutputCommitter for MapReduce job by overwriting Output Format Class.
This can be used to change the output directory, customize the file name etc.

Env:

MapR 6.1
Hadoop 2.7.0

Solution:

Here is the sample code by modifying the wordcount sample MapReduce job.
In this example, we just changes the job output directory to add a sub-directory named "mysubdir".
This sample code only have 2 java files:
  • WordCount.java -- Job driver class.
  • myOutputFormat.java -- Output Format Class defined by us
1. WordCount.java
Most of the code is the same as a sample WordCount job, and we only overwrite the Output Format Class:
job.setOutputFormatClass(myOutputFormat.class);

2. myOutputFormat.java
We customized the method "getOutputCommitter" as below:
  public synchronized OutputCommitter getOutputCommitter(TaskAttemptContext context)
    throws IOException
  {
    if (this.myCommitter == null)
    {
      Path output = new Path(getOutputDir(context));
      this.myCommitter = new FileOutputCommitter(output, context);
    }
    return this.myCommitter;
  }

  protected static String getOutputDir(TaskAttemptContext context)
  {
    int taskID = context.getTaskAttemptID().getTaskID().getId();
    String taskType = context.getTaskAttemptID().getTaskID().getTaskType().toString();
    System.err.println("MyDebug: taskattempt id is: " + taskID + " and tasktype is: " + taskType);
    String outputBaseDir = getOutputPath(context).toString() + "/mysubdir";
    return outputBaseDir;
  }

This is a simple demo to change the job output directory.
You can also customize the file name or compression types by overriding getRecordWriter method as well.

After running this MapReduce job, the output files will be put in:
# hadoop fs -ls -R /hao/wordfinal
drwxr-xr-x   - mapr mapr          4 2019-04-23 13:29 /hao/wordfinal/mysubdir
-rwxr-xr-x   3 mapr mapr          0 2019-04-23 13:29 /hao/wordfinal/mysubdir/_SUCCESS
-rwxr-xr-x   3 mapr mapr          0 2019-04-23 13:29 /hao/wordfinal/mysubdir/part-r-00000
-rwxr-xr-x   3 mapr mapr          0 2019-04-23 13:29 /hao/wordfinal/mysubdir/part-r-00001
-rwxr-xr-x   3 mapr mapr         51 2019-04-23 13:29 /hao/wordfinal/mysubdir/part-r-00002


3 comments:

  1. I’m going to read this. I’ll be sure to come back. thanks for sharing. and also This article gives the light in which we can observe the reality. this is very nice one and gives indepth information. thanks for this nice article... 여성유흥알바

    ReplyDelete
  2. The writer has outdone himself this time. It is not at all enough; the website is also utmost perfect. I will never forget to visit your site again and again. tìm việc làm partime 365

    ReplyDelete
  3. Hello, I think your blog might be having browser compatibility issues.
    When I look at your blog in Firefox, it looks fine but
    when opening in Internet Explorer, it has some overlapping.
    I just wanted to give you a quick heads up! Other then that, fantastic blog!
    안전놀이터


    ReplyDelete

Popular Posts