Tuesday, December 30, 2014

Oozie job log errors "Call From xx.xx.xx.xx to 0.0.0.0:10020 failed on connection exception"

Env:

MapR 4.0.1 + Oozie 4.0.1

Symptom:

1. Oozie mapreduce job hung with status "Running" in oozie console.
You can also use below command to check the status of the oozie job:
oozie job -info <oozie job id>
2. Oozie log shows below error using command "oozie job -log <oozie job id>":
Caused by: java.net.ConnectException: Call From xx.xx.xx.xx to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused;

Root Cause:

10020 is the default port of JobHistoryServer, so the error message shows Oozie can not find the correct IP address of JobHistoryServer, that is why it is trying to connect "0.0.0.0".

Solution:

1. Find out which server is running JobHistoryServer service.
For example:
[root@mapr4-1 ~]# clush -a jps -m|grep -i JobHistoryServer
mapr4-3: 32295 JobHistoryServer
From above result, we know that server "mapr4-3" is running JobHistoryServer.
2. Confirm the port used by job history server.
On above host, check mapred-site.xml to get the the port used by JobHistoryServer.
For MR1, mapred-site.xml is located at /opt/mapr/hadoop/hadoop-<version>/conf;
For MR2/YARN,  mapred-site.xml is located at /opt/mapr/hadoop/hadoop-<version>/etc/hadoop.
Find below parameter:
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>xx.xx.xx.xx:10020</value>
  </property>
After getting the port, you can double confirm the port is occupied by JobHistoryServer using below command:
# lsof -i:10020
COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
java    32295 mapr  212u  IPv4 34350382      0t0  TCP mapr4-3:10020 (LISTEN)
3. On oozie host, add below parameter in /opt/mapr/oozie/oozie-<version>/conf/hadoop-conf/core-site.xml.
 <property>
      <name>mapreduce.jobhistory.address</name>
      <value>hostname:Port</value>
 </property>
 "hostname" and "Port" are for the JobHistoryServer service.
4. Re-run the job and monitor the oozie job to make sure no more such error message again.

No comments:

Post a Comment

Popular Posts