Tuesday, July 15, 2014

Table created by Oozie hive actions but invisible to Hive metastore

Symptom:

Table is created by Oozie hive actions, however table is not shown in hive shell which means it is not in Hive metastore.
The Oozie job is just to execute below SQLs to create a hive table and load some data into it.
create table oozie_passwords (col0 string, col1 string, col2 string, col3 string, col4 string,
col5 string, col6 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ":";
load data local inpath "${INPUT}" into table oozie_passwords;

1. Oozie job succeeded from "oozie job -info <jobID>"

[root@admin ~]# oozie job -info 0000016-140711123346649-oozie-oozi-W
Job ID : 0000016-140711123346649-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : hive-wf
App Path      : hdfs://nameservice1/user/root/examples/apps/hive_new
Status        : SUCCEEDED
Run           : 0
User          : root
Group         : -
Created       : 2014-07-15 20:35 GMT
Started       : 2014-07-15 20:35 GMT
Last Modified : 2014-07-15 20:36 GMT
Ended         : 2014-07-15 20:36 GMT
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID                                                                            Status    Ext ID                 Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000016-140711123346649-oozie-oozi-W@:start:                                  OK        -                      OK         -
------------------------------------------------------------------------------------------------------------------------------------
0000016-140711123346649-oozie-oozi-W@hive-node                                OK        job_1404818506021_0036 SUCCEEDED  -
------------------------------------------------------------------------------------------------------------------------------------
0000016-140711123346649-oozie-oozi-W@end                                      OK        -                      OK         -
------------------------------------------------------------------------------------------------------------------------------------

2. Table data is put in the right place on HDFS by Oozie job.

[root@admin ~]# hdfs dfs -ls /user/hive/warehouse/oozie_passwords
Found 6 items
-rwxrwxrwt   3 root supergroup       2800 2014-07-15 11:01 /user/hive/warehouse/oozie_passwords/passwd
-rwxrwxrwt   3 root supergroup       2800 2014-07-15 12:51 /user/hive/warehouse/oozie_passwords/passwd_copy_1
-rwxrwxrwt   3 root supergroup       2830 2014-07-15 13:09 /user/hive/warehouse/oozie_passwords/passwd_copy_2
-rwxrwxrwt   3 root supergroup       2800 2014-07-15 13:20 /user/hive/warehouse/oozie_passwords/passwd_copy_3
-rwxrwxrwt   3 root supergroup       2800 2014-07-15 13:36 /user/hive/warehouse/oozie_passwords/passwd_copy_4
-rwxrwxrwt   3 root supergroup       2800 2014-07-15 14:20 /user/hive/warehouse/oozie_passwords/passwd_copy_5

3. However the table is not visible in Hive shell.

hive> show tables;
OK

Env:

CDH 5.0 managed by CM5.
Oozie 4.0.0

Troubleshoot:

1. Identify the mapreduce job id of the Oozie Job using "oozie job -info <jobID>".

job_1404818506021_0036

2. Identify the attempt ID using "mapred job -list-attempt-ids".

In this case, this job only has 1 map attempt.
[root@admin hive_new]# mapred job -list-attempt-ids job_1404818506021_0036 MAP completed
attempt_1404818506021_0036_m_000000_0
[root@admin hive_new]# mapred job -list-attempt-ids job_1404818506021_0036 REDUCE completed

3. Check the logs of each attempt using "mapred job -logs".

If we grep -i "metastore", we can see below messages:
# mapred job -logs job_1404818506021_0036 attempt_1404818506021_0036_m_000000_0 |grep -i metastore|more
14/07/15 13:36:23 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
14/07/15 13:36:23 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
Seems the map job did not use correct Hive metastore which is PostgreSQL in this case.
After fixing the issue, compared to below logs from a good job:
# mapred job -logs job_1404818506021_0041 attempt_1404818506021_0041_m_000000_0|grep -i metastore|more
14/07/15 16:31:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://admin.xxx.com:9083
14/07/15 16:31:27 INFO hive.metastore: Connected to metastore.

Root Cause: 

"<job-xml>hive-site.xml</job-xml>" entry is missing in workflow.xml.
And also forgot to copy hive-site.xml in the same directory with workflow.xml on HDFS.

Solution:

1. Copy hive-site.xml to the same directory as workflow.xml on HDFS.
2. Add "<job-xml>hive-site.xml</job-xml>" into workflow.xml.
Sample workflow.xml:
<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
    <start to="hive-node"/>

    <action name="hive-node">
        <hive xmlns="uri:oozie:hive-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <job-xml>hive-site.xml</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>script.q</script>
            <param>INPUT=/etc/passwd</param>
        </hive>
        <ok to="end"/>
        <error to="fail"/>
    </action>

    <kill name="fail">
        <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

No comments:

Post a Comment

Popular Posts