Tuesday, May 23, 2017

Hive on Tez : How to identify the reused YARN containers

Goal:

Hive on Tez : How to identify the reused YARN containers

Env:

Hive 2.1
Tez 0.8

Solution:

Tez can reuse YARN containers to improve the performance, because it saves the time to allocate a new YARN container.
Take a simple Hive on Tez query for example:
hive> set hive.execution.engine=tez;
hive> select count(*) from passwords;
Query ID = mapr_20170523131710_be8b1c97-6607-4c2e-8fb2-8e1c152047c3
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1475192050844_0021)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 8.41 s
----------------------------------------------------------------------------------------------
OK
29
Time taken: 18.663 seconds, Fetched: 1 row(s)
Quick question: How many YARN containers are spawned for above query?
You may answer "3" based on common understanding of a YARN job: one Application Master container, one Mapper container and one Reducer container.

But the actual answer in above example is 2.
Based on below ResourceManager log:
# tail -100f yarn-mapr-resourcemanager-s1.poc.com.log |grep -i assign
2017-05-23 13:17:14,964 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e02_1475192050844_0021_01_000001 of capacity <memory:1024, vCores:1, disks:0.0> on host s4.poc.com:57626, which has 1 containers, <memory:1024, vCores:1, disks:0.0> used and <memory:16897, vCores:1, disks:1.33> available after allocation
2017-05-23 13:17:21,972 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e02_1475192050844_0021_01_000002 of capacity <memory:1024, vCores:1, disks:0.0> on host s3.poc.com:44818, which has 1 containers, <memory:1024, vCores:1, disks:0.0> used and <memory:5383, vCores:1, disks:1.33> available after allocation
Only 2 YARN containers allocated for application application_1475192050844_0017:
  1. container_e02_1475192050844_0021_01_000001
  2. container_e02_1475192050844_0021_01_000002
Now let's go to each container log directory to take a look:

1. container_e02_1475192050844_0021_01_000001

This is the DAG Application Master container.
Evidence is in the container log "syslog" file:
[root@s4 container_e02_1475192050844_0021_01_000001]# grep "Creating DAGAppMaster" syslog
2017-05-23 13:17:17,638 [INFO] [main] |app.DAGAppMaster|: Creating DAGAppMaster for applicationId=application_1475192050844_0021, attemptNum=1, AMContainerId=container_e02_1475192050844_0021_01_000001, jvmPid=15496, userFromEnv=mapr, cliSessionOption=true, pwd=/tmp/hadoop-mapr/nm-local-dir/usercache/mapr/appcache/application_1475192050844_0021/container_e02_1475192050844_0021_01_000001, localDirs=/tmp/hadoop-mapr/nm-local-dir/usercache/mapr/appcache/application_1475192050844_0021, logDirs=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1475192050844_0021/container_e02_1475192050844_0021_01_000001

As we know, this DAG Application Master may run multiple Hive queries(DAGs), so for this Hive query, the DAG ID can be obtained from above "syslog" as well:
[root@s4 container_e02_1475192050844_0021_01_000001]# grep dagId syslog
2017-05-23 13:17:20,152 [INFO] [IPC Server handler 0 on 39921] |app.DAGAppMaster|: Generating DAG graphviz file, dagId=dag_1475192050844_0021_1, filePath=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1475192050844_0021/container_e02_1475192050844_0021_01_000001/dag_1475192050844_0021_1.dot

Then for that Hive query(dagId=dag_1475192050844_0021_1), the details are actually in its DAG log which is syslog_dag_1475192050844_0021_1.
For example, the other map and reduce containers:
[root@s4 container_e02_1475192050844_0021_01_000001]# grep -i assign syslog_dag_1475192050844_0021_1 |grep -i container
2017-05-23 13:17:22,775 [INFO] [DelayedContainerManager] |rm.YarnTaskSchedulerService|: Assigning container to task: containerId=container_e02_1475192050844_0021_01_000002, task=attempt_1475192050844_0021_1_00_000000_0, containerHost=s3.poc.com:44818, containerPriority= 5, containerResources=<memory:1024, vCores:1, disks:0.0>, localityMatchType=RackLocal, matchedLocation=/default-rack, honorLocalityFlags=false, reusedContainer=false, delayedContainers=0
2017-05-23 13:17:29,054 [INFO] [DelayedContainerManager] |rm.YarnTaskSchedulerService|: Assigning container to task: containerId=container_e02_1475192050844_0021_01_000002, task=attempt_1475192050844_0021_1_01_000000_0, containerHost=s3.poc.com:44818, containerPriority= 5, containerResources=<memory:1024, vCores:1, disks:0.0>, localityMatchType=NodeLocal, matchedLocation=s3.poc.com, honorLocalityFlags=true, reusedContainer=true, delayedContainers=1
So we got to know that the Mapper and Reducer tasks are sharing the same container -- container_e02_1475192050844_0021_01_000002.
Here  the keyword is "reusedContainer=true" which means the Reducer reused the container which was used for Mapper.

Simply saying, in the DAG Application Master container log directory, the "syslog" can tell you which DAG ID is for which query.
The corresponding "DAG syslog" can tell you what are other Mapper and Reducer containers.

2. container_e02_1475192050844_0021_01_000002

Actually the "syslog" does not have much information about the 2 tasks.Because the log files are redirected to below "attempt syslog" files:
syslog_attempt_1475192050844_0021_1_00_000000_0 -- for Mapper
syslog_attempt_1475192050844_0021_1_01_000000_0 -- for Reducer

Key Takeaways:

The methodology is:
  • From ResourceManager log to find out which containers are spawned for this Hive on Tez job.
  • From DAG Application Master "syslog" to find out what is the DAG ID for this Hive query.
  • From DAG Application Master "DAG syslog" to find out what are other Mapper or Reducer containers and also which container is reused.

No comments:

Post a Comment