You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2017/10/13 15:13:00 UTC

[jira] [Commented] (AMBARI-22235) Druid service check failed during EU

    [ https://issues.apache.org/jira/browse/AMBARI-22235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203686#comment-16203686 ] 

slim bouguerra commented on AMBARI-22235:
-----------------------------------------

As per [~rlevas] explanation. 
The issue is related to the default value of the Druid principal...
{code}
"value": "${druid-env/druid_user}@${realm}",
{code}
Though technically this is correct, by not adding some potentially unique value to the principal name we run the risk of invalidating the Druid keytab file if multiple clusters are configured to use the same KDC. This is because Ambari will change the password for a principal when it goes to create the relevant keytab file.

All headless (or user) Kerberos identities should include something like the cluster name in its principal name. Ambari will do this for you by adding the {principal_suffix}} variable. For example:
{code}
"value": "${druid-env/druid_user}${principal_suffix}@${realm}",
{code}
By default, this is set to a dash followed by the cluster's name. This may result in the following principal name:
{code}
druid-cl1@HWQE.HORTONWORKS.COM
{code}


> Druid service check failed during EU
> ------------------------------------
>
>                 Key: AMBARI-22235
>                 URL: https://issues.apache.org/jira/browse/AMBARI-22235
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: slim bouguerra
>
> Observed this issue on two clusters
> Druid service check failed during EU 
> {code}
> Traceback (most recent call last):
>   File "/var/lib/ambari-agent/cache/common-services/DRUID/0.10.1/package/scripts/service_check.py", line 44, in <module>
>     ServiceCheck().execute()
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 367, in execute
>     method(env)
>   File "/var/lib/ambari-agent/cache/common-services/DRUID/0.10.1/package/scripts/service_check.py", line 30, in service_check
>     self.checkComponent(params, "druid_coordinator", "druid-coordinator")
>   File "/var/lib/ambari-agent/cache/common-services/DRUID/0.10.1/package/scripts/service_check.py", line 40, in checkComponent
>     logoutput=True)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
>     self.env.run()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
>     self.run_action(resource, action)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
>     provider_action()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
>     tries=self.resource.tries, try_sleep=self.resource.try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
>     result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
>     tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
>     result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
>     raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -s -o /dev/null -w'%{http_code}' --negotiate -u: -k ctr-e134-1499953498516-217002-01-000010.hwx.site:8081/status | grep 200' returned 1.
> {code}
> Here is the failure stack trace from druid logs 
> {code} 
>  
> Caused by: sun.security.krb5.Asn1Exception: Identifier doesn't match expected value (906)
>         at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) ~[?:1.7.0_95]
>         at sun.security.krb5.internal.ASRep.init(ASRep.java:64) ~[?:1.7.0_95]
>         at sun.security.krb5.internal.ASRep.<init>(ASRep.java:59) ~[?:1.7.0_95]
>         at sun.security.krb5.KrbAsRep.<init>(KrbAsRep.java:60) ~[?:1.7.0_95]
>         at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316) ~[?:1.7.0_95]
>         at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361) ~[?:1.7.0_95]
>         at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:735) ~[?:1.7.0_95]
>         at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584) ~[?:1.7.0_95]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_95]
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_95]
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_95]
>         at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_95]
>         at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762) ~[?:1.7.0_95]
>         at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) ~[?:1.7.0_95]
>         at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690) ~[?:1.7.0_95]
>         at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688) ~[?:1.7.0_95]
>         at java.security.AccessController.doPrivileged(Native Method) ~[?:1.7.0_95]
>         at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687) ~[?:1.7.0_95]
>         at javax.security.auth.login.LoginContext.login(LoginContext.java:595) ~[?:1.7.0_95]
>         at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1089) ~[?:?]
>         at io.druid.storage.hdfs.HdfsStorageAuthentication.authenticate(HdfsStorageAuthentication.java:65) ~[?:?]
>         ... 10 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)