Tuesday, July 12, 2016

Memory allocation for Oozie Launcher job

Goal:

This article explains the configuration parameters for Oozie Launcher job.

Env:

MapR 5.1
Oozie 4.2.0

Solution:

1. Oozie Launcher Job architecture

Oozie Launcher job is a map-only job which will start the jobs which does the real work: eg, Hive, MR, Pig, etc.
Take Oozie Hive job for example:
Comparing to a normal Hive query submitted through Hive CLI:
As we can see, the Oozie Launcher job contains the Hive CLI command.
Say one Hive query runs fine only after increasing the Hive CLI java heap size(-Xmx) to 16GB.
If we want to migrate this Hive query to Oozie Hive job, we should also increase the YARN container size to 16GB for Oozie Launcher job.

2. How to increase the YARN container size for AM or Mapper of Oozie Hive job?

It is controlled by below 4 parameters set in workflow.xml for each Oozie job.
  • oozie.launcher.mapreduce.map.memory.mb
  • oozie.launcher.mapreduce.map.java.opts
  • oozie.launcher.yarn.app.mapreduce.am.resource.mb
  • oozie.launcher.mapreduce.map.java.opts
The algorithum is in Oozie source code:
core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java
            // memory.mb
            int launcherMapMemoryMB = launcherConf.getInt(HADOOP_MAP_MEMORY_MB, 1536);
            int amMemoryMB = launcherConf.getInt(YARN_AM_RESOURCE_MB, 1536);
            // YARN_MEMORY_MB_MIN to provide buffer.
            // suppose launcher map aggressively use high memory, need some
            // headroom for AM
            int memoryMB = Math.max(launcherMapMemoryMB, amMemoryMB) + YARN_MEMORY_MB_MIN;
            // limit to 4096 in case of 32 bit
            if (launcherMapMemoryMB < 4096 && amMemoryMB < 4096 && memoryMB > 4096) {
                memoryMB = 4096;
            }
            launcherConf.setInt(YARN_AM_RESOURCE_MB, memoryMB);

            // We already made mapred.child.java.opts and
            // mapreduce.map.java.opts equal, so just start with one of them
            String launcherMapOpts = launcherConf.get(HADOOP_MAP_JAVA_OPTS, "");
            String amChildOpts = launcherConf.get(YARN_AM_COMMAND_OPTS);
            StringBuilder optsStr = new StringBuilder();
            int heapSizeForMap = extractHeapSizeMB(launcherMapOpts);
            int heapSizeForAm = extractHeapSizeMB(amChildOpts);
            int heapSize = Math.max(heapSizeForMap, heapSizeForAm) + YARN_MEMORY_MB_MIN;
            // limit to 3584 in case of 32 bit
            if (heapSizeForMap < 4096 && heapSizeForAm < 4096 && heapSize > 3584) {
                heapSize = 3584;
            }
            if (amChildOpts != null) {
                optsStr.append(amChildOpts);
            }
            optsStr.append(" ").append(launcherMapOpts.trim());
            if (heapSize > 0) {
                // append calculated total heap size to the end
                optsStr.append(" ").append("-Xmx").append(heapSize).append("m");
            }
            launcherConf.set(YARN_AM_COMMAND_OPTS, optsStr.toString().trim());
In above code, YARN_MEMORY_MB_MIN=512.
For memory.mb:
max(oozie.launcher.mapreduce.map.memory.mb,oozie.launcher.yarn.app.mapreduce.am.resource.mb)+512
For JAVA OPT:
max(oozie.launcher.mapreduce.map.java.optsb,oozie.launcher.mapreduce.map.java.opts)+512

Examples:
1. Set below in workflow.xml:
             <property>
                <name>oozie.launcher.mapreduce.map.memory.mb</name>
                <value>1024</value>
            </property>
            <property>
                <name>oozie.launcher.mapreduce.map.java.opts</name>
                <value>-Xmx777m</value>
            </property>

             <property>
                <name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
                <value>2048</value>
            </property>
            <property>
                <name>oozie.launcher.mapreduce.map.java.opts</name>
                <value>-Xmx1111m</value>
            </property>
The actual container size for Oozie Launcher job is: (3072mb,-Xmx1623m).
The memory.mb=3072 because max(1024,2048)+512=2560 ==> 3072 because of yarn.scheduler.minimum-allocation-mb=1024.
2. Set below in workflow.xml:
             <property>
                <name>oozie.launcher.mapreduce.map.memory.mb</name>
                <value>3072</value>
            </property>
            <property>
                <name>oozie.launcher.mapreduce.map.java.opts</name>
                <value>-Xmx777m</value>
            </property>

             <property>
                <name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
                <value>2048</value>
            </property>
            <property>
                <name>oozie.launcher.mapreduce.map.java.opts</name>
                <value>-Xmx1111m</value>
            </property>
The actual container size for Oozie Launcher job is: (4098mb,-Xmx1623m).

3. How to verify the Oozie Launcher Container Size?

Do not blindly trust the configuration page because there could be multiple sources to control the same thing.
Take above example #2 for example:
To check actual memory.mb, start with RM log:
Assigned container container_e04_1468279966583_0020_01_000001 of capacity <memory:4096, vCores:1, disks:0.0>
To check the actual java opts, do "ps -ef" on the NM when the Oozie Launcher job is running:
v7: mapr     18959 18948 99 19:36 ?        00:00:04 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-1.b14.el6.x86_64/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1468279966583_0020/container_e04_1468279966583_0020_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Xmx1024m -Xmx200m -Xmx1111m -Xmx1623m -Djava.io.tmpdir=./tmp org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Key Takeaways:

1. When Oozie Job runs "OutOfMemory", figure out is it Oozie Launcher Job, or the MR job spawned by Hadoop components.
2. Knows how to verify the memory.mb and JAVA opts for Oozie Launcher job during runtime.

11 comments:

Popular Posts