Monday, July 11, 2016

How to troubleshoot Oozie Hive action

Goal:

This article explains the troubleshooting methodology for Oozie Hive action for newcomers. 

Solution:

Just like troubleshooting a normal hive query using Hive CLI, we firstly need to check the hive log which will show each stage and related MR job.
And then check the logs for MR job if specific MR job fails.

Regarding Oozie Hive action, the first question is:
Where is the hive log?
It is in the stdout of "Oozie Launcher map-reduce job".
This is the starting point of troubleshooting.

Below I will take a simple Oozie Hive action for example.
The script.q is as below:
DROP TABLE test;
CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;
As we know, only above INSERT query will spawn MR job(s).
In this case, since the test table is pretty small, it will only spawn 1 MR job.

1. Submit the Oozie Hive action

$ oozie job -config /home/mapr/myoozie/job.properties -run
job: 0000002-160711193233753-oozie-mapr-W
From the Oozie job ID, figure out which MR job is "Oozie Launcher map-reduce job" -- job_1468279966583_0005
$ oozie job -info 0000002-160711193233753-oozie-mapr-W
Job ID : 0000002-160711193233753-oozie-mapr-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : hive-wf
App Path      : maprfs:/user/mapr/examples/apps/hive
Status        : SUCCEEDED
Run           : 0
User          : mapr
Group         : -
Created       : 2016-07-11 23:50 GMT
Started       : 2016-07-11 23:50 GMT
Last Modified : 2016-07-11 23:51 GMT
Ended         : 2016-07-11 23:51 GMT
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID                                                                            Status    Ext ID                 Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000002-160711193233753-oozie-mapr-W@:start:                                  OK        -                      OK         -
------------------------------------------------------------------------------------------------------------------------------------
0000002-160711193233753-oozie-mapr-W@hive-node                                OK        job_1468279966583_0005 SUCCEEDED  -
------------------------------------------------------------------------------------------------------------------------------------
0000002-160711193233753-oozie-mapr-W@end                                      OK        -                      OK         -
------------------------------------------------------------------------------------------------------------------------------------

2. Hive log exists in the stdout of "Oozie Launcher map-reduce job"

From RM log, search job ID of "Oozie Launcher map-reduce job" -- 1468279966583_0005:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e04_1468279966583_0005_01_000001 of capacity <memory:3072, vCores:1, disks:0.0> on host v5.poc.com:55513, which has 1 containers, <memory:3072, vCores:1, disks:0.0> used and <memory:2048, vCores:1, disks:1.33> available after allocation
Above shows that there is only one container for "Oozie Launcher map-reduce job", and it is allocated to node "v5.poc.com".

From the container log of this "Oozie Launcher map-reduce job" on v5.poc.com, we can find the complete hive log.
Eg: /opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1468279966583_0005/container_e04_1468279966583_0005_01_000001/stdout
5434 [uber-SubtaskRunner] INFO  hive.ql.parse.ParseDriver  - Parsing command: DROP TABLE test
...
...
6885 [uber-SubtaskRunner] INFO  hive.ql.parse.ParseDriver  - Parsing command:
CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '/user/mapr/input-data/table'
...
...
7254 [uber-SubtaskRunner] INFO  hive.ql.parse.ParseDriver  - Parsing command:
INSERT OVERWRITE DIRECTORY '/user/mapr/output-data/hive' SELECT * FROM test
...
...
9109 [uber-SubtaskRunner] INFO  org.apache.hadoop.hive.ql.exec.Task  - Starting Job = job_1468279966583_0006, Tracking URL = http://v6.poc.com:8088/proxy/application_1468279966583_0006/
...
...
31023 [uber-SubtaskRunner] INFO  org.apache.hadoop.hive.ql.Driver  - MapReduce Jobs Launched:
31023 [uber-SubtaskRunner] INFO  org.apache.hadoop.hive.ql.Driver  - Stage-Stage-1: Map: 1   Cumulative CPU: 1.07 sec   MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
31023 [uber-SubtaskRunner] INFO  org.apache.hadoop.hive.ql.Driver  - Total MapReduce CPU Time Spent: 1 seconds 70 msec
31023 [uber-SubtaskRunner] INFO  org.apache.hadoop.hive.ql.Driver  - OK

So in this example, 2 MR jobs are related to this Oozie Hive action:
job_1468279966583_0005 is the "Oozie Launcher map-reduce job";
job_1468279966583_0006 is the MR job spawned by Hive query.

3. Then troubleshoot the MR job which is spawned by Hive.

Refer to this blog:
How to troubleshoot Yarn job failure issue

Key takeaways

Figure out which MR job is the "Oozie Launcher map-reduce job".
Check the hive logs from the container log of "Oozie Launcher map-reduce job".
Finally check the MR job(s) spawned from Hive.

No comments:

Post a Comment

Popular Posts