You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Robert Levas (JIRA)" <ji...@apache.org> on 2018/05/16 23:08:00 UTC

[jira] [Comment Edited] (AMBARI-23866) Kerberos Service Check failure due to kinit failure on random node

    [ https://issues.apache.org/jira/browse/AMBARI-23866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478229#comment-16478229 ] 

Robert Levas edited comment on AMBARI-23866 at 5/16/18 11:07 PM:
-----------------------------------------------------------------

[~quirogadf]...  This is a known issue related to replicated KDCs.  I thought that setting the {{master_kdc}} value in the {{krb5.conf}} file via the Ambari property {{kerberos-env/master_kdc}} (found in the UI under Advanced Kerberos-env) would help this issue.  Have you tried that. 

From https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html
{quote}
master_kdc
Identifies the master KDC(s). Currently, this tag is used in only one case: If an attempt to get credentials fails because of an invalid password, the client software will attempt to contact the master KDC, in case the user’s password has just been changed, and the updated database has not been propagated to the slave servers yet.
{quote}

If that does not work. I am not sure what a good solution would be.  Maybe automatically retrying might help, but it all depends on the latency of the replication process.  


was (Author: rlevas):
[~quirogadf]...  This is a known issue related to replicated KDCs.  I thought that setting the {{master_kdc}} value in the {{krb5.conf}} file via the Ambari property {{kerberos-env/master_kdc}} (found in the UI under Advanced Kerberos-env) would help this issue.  Have you tried that. 

If that does not work. I am not sure what a good solution would be.  Maybe automatically retrying might help, but it all depends on the latency of the replication process.  

> Kerberos Service Check failure due to kinit failure on random node
> ------------------------------------------------------------------
>
>                 Key: AMBARI-23866
>                 URL: https://issues.apache.org/jira/browse/AMBARI-23866
>             Project: Ambari
>          Issue Type: Improvement
>    Affects Versions: 2.5.2
>         Environment: Multiple Kerberos Domain Controllers across multiple data centers for single realm.
>            Reporter: David F. Quiroga
>            Assignee: David F. Quiroga
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We were seeing Kerberos Service checks failures in Ambari. Specifically it would fail during the first run of the day, succeed on the second, then fail on the next but succeed if run again and so forth.
> Reviewing the operation log, it showed kinit failure from random node(s)
>  {{kinit: Client XXXX not found in Kerberos database while getting initial credentials}}
> Since AMBARI-9852
> {quote}The service check must perform the following steps:
>    1.Create a unique principal in the relevant KDC (server)
>    2.Test that the principal can be used to authenticate via kinit (agent)
>    3.Destroy the principal (server)
> {quote}
> Which is a very good check of services.
> So what is happening...
> In our environment we have multiple Kerberos Domain Controllers across multiple data centers all providing the same realm.
> The creation of a unique principal occurs at a single KDC and is propagated to the others.
> The agents were testing the principal at different KDC, i.e. before it had a change to propagate. This is why the second service check would succeed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)