Tuesday, June 10, 2014

Common Issues when configuring Kerberos on PivotalHD

This article lists the issue I met when Configuring Kerberos for HDFS and YARN.

1. Namenode fails with error "Login failure for hdfs/hdm.xxx.com@OPENKBINFO.COM from keytab /etc/security/phd/keytab/hdfs.service.keytab".

Error message in namenode log:

************************************************************/
2014-06-07 16:49:35,421 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-06-07 16:49:35,460 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-06-07 16:49:35,460 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2014-06-07 16:49:36,024 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: Login failure for hdfs/hdm.xxx.com@OPENKBINFO.COM from keytab /etc/security/phd/keytab/hdfs.service.keytab
 at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:835)
 at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:283)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:423)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:434)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:609)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:594)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)
Caused by: javax.security.auth.login.LoginException: java.lang.ExceptionInInitializerError
 at javax.crypto.SunJCE_h.<clinit>(DashoA13*..)
 at javax.crypto.Cipher.c(DashoA13*..)
 at javax.crypto.Cipher.getMaxAllowedKeyLength(DashoA13*..)
 at sun.security.krb5.internal.crypto.EType.getBuiltInDefaults(EType.java:179)
 at sun.security.krb5.internal.crypto.EType.isSupported(EType.java:261)
 at sun.security.krb5.internal.ktab.KeyTab.readServiceKeys(KeyTab.java:263)
 at sun.security.krb5.EncryptionKey.acquireSecretKeys(EncryptionKey.java:140)
 at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:635)
 at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:826)
 at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:283)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:423)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:434)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:609)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:594)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)
Caused by: java.lang.SecurityException: Cannot set up certs for trusted CAs
 at javax.crypto.SunJCE_b.<clinit>(DashoA13*..)
 ... 27 more
Caused by: java.lang.SecurityException: Jurisdiction policy files are not signed by trusted signers!
 at javax.crypto.SunJCE_b.a(DashoA13*..)
 at javax.crypto.SunJCE_b.i(DashoA13*..)
 at javax.crypto.SunJCE_b.g(DashoA13*..)
 at javax.crypto.SunJCE_b$1.run(DashoA13*..)
 at java.security.AccessController.doPrivileged(Native Method)
 ... 28 more

 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:872)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:826)
 ... 7 more
2014-06-07 16:49:36,029 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2014-06-07 16:49:36,031 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hdm.xxx.com/192.168.192.101
************************************************************/

Cause:

JDK is 1.6 on all hosts. It brings some compatibility issues.

Fix:

Shutdown cluster, remove old JDK 1.6 and install JDK 1.7 on all hosts.
rpm -e jdk-1.6.0
yum install java-1.7.0-openjdk
alternatives --config java

2. Namenode fails with error "Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled".

Error message in namenode log:

2014-06-07 20:19:16,982 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8020: readAndProcess threw exception javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)] from client 192.168.192.101. Count of bytes read: 0
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]
 at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:177)
 at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1173)
 at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1350)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:726)
 at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:525)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:500)
Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)
 at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788)
 at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
 at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
 at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:155)
 ... 5 more
Caused by: KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled
 at sun.security.krb5.EncryptionKey.findKey(EncryptionKey.java:552)
 at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:270)
 at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:144)
 at sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:108)
 at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771)
 ... 8 more 

Cause:

JCE jar files are missing due to upgrading JDK 1.6 to JDK 1.7.(Issue 1 above.)

Fix:

Follow step <2. Install JCE on all Cluster Hosts> in Installing the MIT Kerberos 5 KDC.

3. Namenode fails with error "javax.security.auth.login.LoginException: No key to store".

Error message in namenode log:

Caused by: javax.servlet.ServletException: javax.security.auth.login.LoginException: No key to store
 at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:185)
 at org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:146)
 at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:107)
 at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
 at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:707)
 at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:254)
 at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1240)
 at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:689)
 at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:482)
 at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
 at org.eclipse.jetty.server.handler.HandlerCollection.doStart(HandlerCollection.java:229)
 at org.eclipse.jetty.server.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:172)
 at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
 at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:95)
 at org.eclipse.jetty.server.Server.doStart(Server.java:279)
 at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
 at org.apache.hadoop.http.HttpServer.start(HttpServer.java:682)
 ... 8 more
Caused by: javax.security.auth.login.LoginException: No key to store
 at com.sun.security.auth.module.Krb5LoginModule.commit(Krb5LoginModule.java:1072)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
 at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
 at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:596)
 at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:169)
 ... 24 more
2014-06-07 21:11:33,511 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2014-06-07 21:11:33,512 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Cause:

Possible cause is the /tmp/krb* cache file got corrupted or not compatible after I fixed issue 1 and issue 2 above.

Fix:

Remove the /tmp/krb* cache file on all hosts and try to restart namenode , then it worked.
rm -f /tmp/krb*

4. Node manager fails with error "Couldn't setup connection for yarn/hdw1.xxx.com@OPENKBINFO.COM to null"

Error message in node manager log:

2014-06-08 12:41:02,890 INFO org.apache.hadoop.yarn.service.AbstractService: Service:httpshuffle is started.
2014-06-08 12:41:02,891 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices is started.
2014-06-08 12:41:02,891 INFO org.apache.hadoop.yarn.service.AbstractService: Service:containers-monitor is started.
2014-06-08 12:41:02,891 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is started.
2014-06-08 12:41:03,240 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:yarn/hdw1.xxx.com@OPENKBINFO.COM (auth:KERBEROS) cause:java.io.IOException: Failed to specify server's Kerberos principal name
2014-06-08 12:41:03,326 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:yarn/hdw1.xxx.com@OPENKBINFO.COM (auth:KERBEROS) cause:java.io.IOException: Failed to specify server's Kerberos principal name
2014-06-08 12:41:06,708 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:yarn/hdw1.xxx.com@OPENKBINFO.COM (auth:KERBEROS) cause:java.io.IOException: Failed to specify server's Kerberos principal name
2014-06-08 12:41:07,635 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:yarn/hdw1.xxx.com@OPENKBINFO.COM (auth:KERBEROS) cause:java.io.IOException: Failed to specify server's Kerberos principal name
2014-06-08 12:41:08,376 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:yarn/hdw1.xxx.com@OPENKBINFO.COM (auth:KERBEROS) cause:java.io.IOException: Failed to specify server's Kerberos principal name
2014-06-08 12:41:08,697 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:yarn/hdw1.xxx.com@OPENKBINFO.COM (auth:KERBEROS) cause:java.io.IOException: Failed to specify server's Kerberos principal name
2014-06-08 12:41:08,697 WARN org.apache.hadoop.ipc.Client: Couldn't setup connection for yarn/hdw1.xxx.com@OPENKBINFO.COM to null

Cause:

On node manager, "yarn.resourcemanager.principal" and "yarn.resourcemanager.keytab" are missing in yarn-site.xml .

Fix:

Add below entries in yarn-site.xml on all node manager.
<!-- resource manager secure configuration info -->
<property>
 <name>yarn.resourcemanager.principal</name>
 <value>yarn/_HOST@OPENKBINFO.COM</value>
</property>

<property>
 <name>yarn.resourcemanager.keytab</name>
 <value>/etc/security/phd/keytab/yarn.service.keytab</value>
</property>

5. YARN resource/node manager process is up but its log shows error "User yarn/hdw1.xxx.com@OPENKBINFO.COM (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client Kerberos principal is null"

Error message in resource/node manager log:

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User yarn/hdw1.xxx.com@OPENKBINFO.COM (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client Kerberos principal is null
 at org.apache.hadoop.ipc.Client.call(Client.java:1235)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy26.registerNodeManager(Unknown Source)
 at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
 ... 6 more

Cause:

Known issue in HADOOP-9444.

Fix:

hadoop-policy.xml
Replace all occurrences of ${HADOOP_HDFS_USER} and ${HADOOP_YARN_USER} with *.

2 comments:

Popular Posts