You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2019/01/24 00:53:24 UTC

[kudu-CR] [java] deflake tests that use KuduTestHarness.findLeaderMasterServer

Hello Alexey Serbin, Grant Henke,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12263

to review the following change.


Change subject: [java] deflake tests that use KuduTestHarness.findLeaderMasterServer
......................................................................

[java] deflake tests that use KuduTestHarness.findLeaderMasterServer

From time to time I'd see test failures like these:

  10:10:16.018 [INFO - Test worker] (KuduTestHarness.java:147) Creating a new Kudu client...
  ...
  10:10:16.036 [WARN - New I/O worker #158] (ConnectToCluster.java:278) None of the provided masters 127.6.239.254:42291,127.6.239.252:41769,127.6.239.253:41053 is a leader; will retry
  ...
  10:10:16.060 [ERROR - Test worker] (RetryRule.java:80) testExportAuthenticationCredentialsDuringLeaderElection(org.apache.kudu.client.TestKuduClient): failed attempt 1
  org.apache.kudu.client.NoLeaderFoundException: Master config (127.6.239.254:42291,127.6.239.252:41769,127.6.239.253:41053) has no leader.
    at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:279)
    at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:47)
    at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:323)
    at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:312)
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
    at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
    at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:247)
    at org.apache.kudu.client.KuduRpc.callback(KuduRpc.java:294)
    at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:269)
    at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59)
    at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:131)
    at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:127)
    at org.apache.kudu.client.Connection.messageReceived(Connection.java:391)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.apache.kudu.client.Connection.handleUpstream(Connection.java:243)
    <more netty stack frames>

I'd look through the code and wonder how this could happen if KUDU-2387 was
indeed fixed. Today I finally noticed that
KuduTestHarness.findLeaderMasterServer calls getMasterTableLocationsPB
directly, and I remembered that without applying the same logic as in
KUDU-2387, such calls will not retry. After adding a catch block to
findLeaderMasterServer and transforming the thrown exception, I got a useful
stack trace confirming the problem:

  10:51:53.627 [ERROR - Test worker] (RetryRule.java:80) testExportAuthenticationCredentialsDuringLeaderElection(org.apache.kudu.client.TestKuduClient): failed attempt 1
  org.apache.kudu.client.NoLeaderFoundException: Master config (127.11.27.62:40985,127.11.27.60:37593,127.11.27.61:37931) has no leader.
    at org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
    at org.apache.kudu.test.KuduTestHarness.findLeaderMasterServer(KuduTestHarness.java:281)
    at org.apache.kudu.test.KuduTestHarness.restartLeaderMaster(KuduTestHarness.java:329)
    at org.apache.kudu.client.TestKuduClient.runTestCallDuringLeaderElection(TestKuduClient.java:1124)
    at
  org.apache.kudu.client.TestKuduClient.testExportAuthenticationCredentialsDuringLeaderElection(TestKuduClient.java:1150)
    ...
    Suppressed: org.apache.kudu.client.KuduException$OriginalException: Original asynchronous stack trace
        at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:279)
        at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:47)
        at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:323)
        at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:312)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
        at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:247)
        <...netty>

This patch fixes these issues by providing an alternate way to find the
leader master: if not known, make some call that will only succeed if the
leader master is known, then try again.

Without the fix, 29/1000 runs of TestKuduClient failed with this error,
either in testExportAuthenticationCredentialsDuringLeaderElection or in
testGetHiveMetastoreConfigDuringLeaderElection.

With the fix, 0/1000 runs of TestKuduClient failed.

Change-Id: I5612619d1b9e30df7d627f2370d60ce2aa812713
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
4 files changed, 38 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/12263/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5612619d1b9e30df7d627f2370d60ce2aa812713
Gerrit-Change-Number: 12263
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>

[kudu-CR] [java] deflake tests that use KuduTestHarness.findLeaderMasterServer

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12263 )

Change subject: [java] deflake tests that use KuduTestHarness.findLeaderMasterServer
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5612619d1b9e30df7d627f2370d60ce2aa812713
Gerrit-Change-Number: 12263
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 24 Jan 2019 01:27:41 +0000
Gerrit-HasComments: No

[kudu-CR] [java] deflake tests that use KuduTestHarness.findLeaderMasterServer

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12263 )

Change subject: [java] deflake tests that use KuduTestHarness.findLeaderMasterServer
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5612619d1b9e30df7d627f2370d60ce2aa812713
Gerrit-Change-Number: 12263
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 25 Jan 2019 00:55:26 +0000
Gerrit-HasComments: No

[kudu-CR] [java] deflake tests that use KuduTestHarness.findLeaderMasterServer

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12263 )

Change subject: [java] deflake tests that use KuduTestHarness.findLeaderMasterServer
......................................................................

[java] deflake tests that use KuduTestHarness.findLeaderMasterServer

From time to time I'd see test failures like these:

  10:10:16.018 [INFO - Test worker] (KuduTestHarness.java:147) Creating a new Kudu client...
  ...
  10:10:16.036 [WARN - New I/O worker #158] (ConnectToCluster.java:278) None of the provided masters 127.6.239.254:42291,127.6.239.252:41769,127.6.239.253:41053 is a leader; will retry
  ...
  10:10:16.060 [ERROR - Test worker] (RetryRule.java:80) testExportAuthenticationCredentialsDuringLeaderElection(org.apache.kudu.client.TestKuduClient): failed attempt 1
  org.apache.kudu.client.NoLeaderFoundException: Master config (127.6.239.254:42291,127.6.239.252:41769,127.6.239.253:41053) has no leader.
    at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:279)
    at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:47)
    at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:323)
    at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:312)
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
    at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
    at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:247)
    at org.apache.kudu.client.KuduRpc.callback(KuduRpc.java:294)
    at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:269)
    at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59)
    at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:131)
    at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:127)
    at org.apache.kudu.client.Connection.messageReceived(Connection.java:391)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.apache.kudu.client.Connection.handleUpstream(Connection.java:243)
    <more netty stack frames>

I'd look through the code and wonder how this could happen if KUDU-2387 was
indeed fixed. Today I finally noticed that
KuduTestHarness.findLeaderMasterServer calls getMasterTableLocationsPB
directly, and I remembered that without applying the same logic as in
KUDU-2387, such calls will not retry. After adding a catch block to
findLeaderMasterServer and transforming the thrown exception, I got a useful
stack trace confirming the problem:

  10:51:53.627 [ERROR - Test worker] (RetryRule.java:80) testExportAuthenticationCredentialsDuringLeaderElection(org.apache.kudu.client.TestKuduClient): failed attempt 1
  org.apache.kudu.client.NoLeaderFoundException: Master config (127.11.27.62:40985,127.11.27.60:37593,127.11.27.61:37931) has no leader.
    at org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
    at org.apache.kudu.test.KuduTestHarness.findLeaderMasterServer(KuduTestHarness.java:281)
    at org.apache.kudu.test.KuduTestHarness.restartLeaderMaster(KuduTestHarness.java:329)
    at org.apache.kudu.client.TestKuduClient.runTestCallDuringLeaderElection(TestKuduClient.java:1124)
    at
  org.apache.kudu.client.TestKuduClient.testExportAuthenticationCredentialsDuringLeaderElection(TestKuduClient.java:1150)
    ...
    Suppressed: org.apache.kudu.client.KuduException$OriginalException: Original asynchronous stack trace
        at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:279)
        at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:47)
        at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:323)
        at org.apache.kudu.client.ConnectToCluster$ConnectToMasterCB.call(ConnectToCluster.java:312)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
        at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:247)
        <...netty>

This patch fixes these issues by providing an alternate way to find the
leader master: if not known, make some call that will only succeed if the
leader master is known, then try again.

Without the fix, 29/1000 runs of TestKuduClient failed with this error,
either in testExportAuthenticationCredentialsDuringLeaderElection or in
testGetHiveMetastoreConfigDuringLeaderElection.

With the fix, 0/1000 runs of TestKuduClient failed.

Change-Id: I5612619d1b9e30df7d627f2370d60ce2aa812713
Reviewed-on: http://gerrit.cloudera.org:8080/12263
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke <gr...@apache.org>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
4 files changed, 38 insertions(+), 21 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Grant Henke: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/12263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5612619d1b9e30df7d627f2370d60ce2aa812713
Gerrit-Change-Number: 12263
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)