You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Eric Yang (Jira)" <ji...@apache.org> on 2019/11/06 19:12:00 UTC

[jira] [Created] (YARN-9956) Improve connection error message for YARN ApiServerClient

Eric Yang created YARN-9956:
-------------------------------

             Summary: Improve connection error message for YARN ApiServerClient
                 Key: YARN-9956
                 URL: https://issues.apache.org/jira/browse/YARN-9956
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Eric Yang


In HA environment, yarn.resourcemanager.webapp.address configuration is optional.  ApiServiceClient may produce confusing error message like this:

{code}
19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: host1.example.com:8090
19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: host2.example.com:8090
19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms
19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {}
GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
	at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER
	at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
	at java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
	at java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
	at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
	at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
	at java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
	at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
	... 15 more
Caused by: KrbException: Identifier doesn't match expected value (906)
	at java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
	at java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
	at java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
	at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
	... 21 more
19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: 
java.io.IOException: java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
Caused by: java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
	... 6 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:135)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
	... 8 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
	at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
	... 12 more
Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER
	at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
	at java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
	at java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
	at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
	at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
	at java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
	at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
	... 15 more
Caused by: KrbException: Identifier doesn't match expected value (906)
	at java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
	at java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
	at java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
	at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
	... 21 more
{code}

When getRMWebAddress fail to connect to either resource manager hosts, it will fall back to use the yarn-default.xml value 0.0.0.0, and attempt to acquire TGS for HTTP/0.0.0.0, which produces the error shown here.  It would be better to avoid trying to use yarn.resourcemanager.webapp.address as fallback for RM host lookup in HA enabled cluster.

In this particular cluster, contacting to host1.example.com and host2.example.com failed due to the same reason that self signed server certificate does not have a valid self-signed CA certificate to verify.  This caused the failure in the first place.  It would be nice if the error message is more verbose to identify the first error than producing error on the fallback logic which makes no sense to user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org