Saturday, June 14, 2014

Secondary namenode fails to get fsimage from namenode in kerberized enviroment

Env:

Hadoop 2.0

Symptom:

1. Secondary namenode logs shows error "Only Namenode, Secondary Namenode, and administrators may access this servlet".

ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: Image transfer servlet at http://hdm.xxx.com:50070/getimage?getimage=1&txid=8464&storageInfo=-40:1006297165:0:CID-ff2db014-06e6-4c70-a33f-bfdd69063735 failed with status code 403
Response message:
Only Namenode, Secondary Namenode, and administrators may access this servlet
 at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.doGetUrl(TransferFsImage.java:245)
 at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:222)
 at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.downloadImageToStorage(TransferFsImage.java:86)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:430)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:416)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:415)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:515)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:367)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:333)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:356)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1458)
 at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:329)
 at java.lang.Thread.run(Thread.java:745)

2. Namenode shows error "Received non-NN/SNN/administrator request for image or edits".

INFO org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet rejecting: hdfs/hdw3.xxx.com@OPENKBINFO.COM
WARN org.apache.hadoop.hdfs.server.namenode.GetImageServlet: Received non-NN/SNN/administrator request for image or edits from hdfs/hdw3.xxx.com@OPENKBINFO.COM at 192.168.192.104

Root Cause:

Namenode is not aware of the kerberos authentication information for secondary namenode.
By checking hdfs-site.xml on namenode, and found below entries are missing.
<!-- (optional) secondary name node secure configuration info -->
<property>
 <name>dfs.secondary.namenode.keytab.file</name>
 <value>/etc/security/phd/keytab/hdfs.service.keytab</value>
</property>

<property>
 <name>dfs.secondary.namenode.kerberos.principal</name>
 <value>hdfs/_HOST@OPENKBINFO.COM</value>
</property>

<property>
 <name>dfs.secondary.namenode.kerberos.http.principal</name>
 <value>HTTP/_HOST@OPENKBINFO.COM</value>
</property>

<property>
 <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
 <value>HTTP/_HOST@OPENKBINFO.COM</value>
</property>

Fix: 

After adding above entries back in hdfs-site.xml on namenode, restart HDFS cluster.
Below log entries are shown on secondary namenode:
INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://hdm.xxx.com:50070/getimage?putimage=1&txid=11104&port=50090&storageInfo=-40:1006297165:0:CID-ff2db014-06e6-4c70-a33f-bfdd69063735
INFO org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet allowing checkpointer: hdfs/hdm.xxx.com@OPENKBINFO.COM
INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 0.42s at 0.00 KB/s
INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11104 to namenode at hdm.xxx.com:50070

No comments:

Post a Comment

Popular Posts