You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Adar Dembo (JIRA)" <ji...@apache.org> on 2018/12/14 00:26:00 UTC

[jira] [Commented] (KUDU-2387) exportAuthenticationCredentials does not retry connectToCluster

    [ https://issues.apache.org/jira/browse/KUDU-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720726#comment-16720726 ] 

Adar Dembo commented on KUDU-2387:
----------------------------------

This issue causes flakiness in every test that calls exportAuthenticationCredentials() without using the same hack as TestSecurity. At the time of writing, this includes:
{noformat}
TestKuduClient.testGetAuthnToken
TestKuduClient.testCloseShortlyAfterOpen
TestKuduClient.testNoLogSpewOnConnectionRefused
{noformat}

Additionally, there are calls in KuduTableMapReduceUtil (kudu-mapreduce) and KuduContext (kudu-spark) that not only cause all associated tests to be flaky, but are also vulnerabilities in the product itself: if someone calls exportAuthenticationCredentials() on a fresh KuduClient during a master leader election, it's liable to fail and not retry.

Finally, getHiveMetastoreConfig() (from the new HMS integration code) is structured like exportAuthenticationCredentials(), so it (and its dependents) is equally vulnerable.


> exportAuthenticationCredentials does not retry connectToCluster
> ---------------------------------------------------------------
>
>                 Key: KUDU-2387
>                 URL: https://issues.apache.org/jira/browse/KUDU-2387
>             Project: Kudu
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> TestSecurity has the following TODO:
> {code}
>     // TODO(todd): it seems that exportAuthenticationCredentials() doesn't properly retry
>     // in the case that there is no leader, even though NoLeaderFoundException is a RecoverableException.
>     // So, we have to use a hack of calling listTabletServers, which _does_ properly retry,
>     // in order to wait for the masters to elect a leader.
> {code}
> It seems like this causes occasional failures of tests like KuduRDDTest -- I saw a case where the client failed to connect due to a negotiation timeout, and then didn't retry at all. It's not clear why the 3-second negotiation timeout was insufficient in this test case but likely just machine load or somesuch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)