You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2017/05/19 16:46:27 UTC

[kudu-CR] KUDU-2021 retry master RPC if negotiation times out

Alexey Serbin has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6927

Change subject: KUDU-2021 retry master RPC if negotiation times out
......................................................................

KUDU-2021 retry master RPC if negotiation times out

This patch fixes KUDU-2021, i.e. with this patch, in case of a
multi-master Kudu cluster, the Kudu C++ client retries an RPC with other
master if the connection negotiation with leader master timed out.

Added new integration test to cover the updated client's behavior.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/client_failover-itest.cc
2 files changed, 46 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 8:

(1 comment)

I hope we can get rid of the port bind errors by adding a resource lock. Would that be sufficient or do you think something else is going on here i.e. TIME_WAIT?

http://gerrit.cloudera.org:8080/#/c/6927/8/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

PS8, Line 553: FLAGS_rpc_negotiation_timeout_ms
> It's a good point.
Ah, yeah it's there in the constructor. Yeah, I guess only doing it on the client is sufficient.

Ok let's pull these tests into their own resource locked test file.


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6927

to look at the new patch set (#11).

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................

KUDU-2021 test for negotiation timeout on Master RPCs

This patch adds integration tests to verify that the Kudu C++ client
behaves as described in KUDU-2021: in case of a multi-master Kudu
cluster, the client retries an RPC with other master if the connection
negotiation with leader master times out.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/client-negotiation-failover-itest.cc
M src/kudu/integration-tests/client_failover-itest.cc
3 files changed, 276 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has uploaded a new patch set (#9).

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................

KUDU-2021 test for negotiation timeout on Master RPCs

This patch adds integration tests to verify that the Kudu C++ client
behaves as described in KUDU-2021: in case of a multi-master Kudu
cluster, the client retries an RPC with other master if the connection
negotiation with leader master times out.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/client-negotiation-failover-itest.cc
M src/kudu/integration-tests/client_failover-itest.cc
3 files changed, 269 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6927/10/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

Line 257:       SleepFor(MonoDelta::FromMilliseconds(FLAGS_rpc_negotiation_timeout_ms));
> can we add a few ms to this sleep? say 50ms
We could, but what is the idea?

The rpc timeout is set to twice that number.


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 9:

> (1 comment)
 > 
 > I hope we can get rid of the port bind errors by adding a resource
 > lock. Would that be sufficient or do you think something else is
 > going on here i.e. TIME_WAIT?

Yep, it seems much better after I separated the multi-master tests and added the RESOURCE_LOCK attribute for 'master-rpc-ports':

  http://dist-test.cloudera.org//job?job_id=aserbin.1495746339.30656

-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


KUDU-2021 test for negotiation timeout on Master RPCs

This patch adds integration tests to verify that the Kudu C++ client
behaves as described in KUDU-2021: in case of a multi-master Kudu
cluster, the client retries an RPC with other master if the connection
negotiation with leader master times out.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Reviewed-on: http://gerrit.cloudera.org:8080/6927
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/client-negotiation-failover-itest.cc
M src/kudu/integration-tests/client_failover-itest.cc
3 files changed, 276 insertions(+), 130 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6927/6//COMMIT_MSG
Commit Message:

Line 7: KUDU-2021 retry master RPC if negotiation times out
> How about:
That looks better!

I would shorted it a bit, though:

KUDU-2021 test for negotiation timeout on Master RPCs


http://gerrit.cloudera.org:8080/#/c/6927/6/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

Line 484:   static const int kTimeoutMs = 1 * 60 * 1000;
> nit: how about const MonoDelta kTimeout = MonoDelta::FromSeconds(60)
yep, that's much better.


Line 488:   cluster_opts_.master_rpc_ports = { 37030, 37031, 37032 };
> huh. do we have to pick these manually?
Yep, that's the ugly piece of all multi-master tests.  The crux is that all masters should know other master's end-points  when starting up.

The fix would be: make probing on ports, somehow reserve them, and then start the masters with the reserved ports.


Line 540:   ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms", "500"));
> It's not obvious to me why a 500ms negotiation delay is sufficient to cause
It seems this went this state after some iterations.

It supposed to be something like the following:

  ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms",
                              Substitute("$0", FLAGS_rpc_negotiation_timeout_ms)));
  thread pause_thread([&]() {
      SleepFor(MonoDelta::FromMilliseconds(FLAGS_rpc_negotiation_timeout_ms / 2));
      CHECK_OK(m->Pause())
    });


Line 542:       SleepFor(MonoDelta::FromSeconds(3));
> What's the purpose of this sleep? Is it so this executes after ListTables()
Yep, it seem that was a typo.  I added corresponding comment into the code regarding purpose of this sleep.  Basically, we want the former leader master to lose its leadership role, but only after the client has connected and started connection negotiation.


Line 548:       ScopedResumeExternalDaemon resumer(m);
> CHECK_OK(m->Resume()); ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6927

to look at the new patch set (#8).

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................

KUDU-2021 test for negotiation timeout on Master RPCs

This patch adds integration tests to verify that the Kudu C++ client
behaves as described in KUDU-2021: in case of a multi-master Kudu
cluster, the client retries an RPC with other master if the connection
negotiation with leader master times out.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/integration-tests/client_failover-itest.cc
1 file changed, 96 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2021 retry master RPC if negotiation times out

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2021 retry master RPC if negotiation times out
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6927/6//COMMIT_MSG
Commit Message:

Line 7: KUDU-2021 retry master RPC if negotiation times out
How about:

KUDU-2021 Add regression test for negotiation timeout on Master RPCs


http://gerrit.cloudera.org:8080/#/c/6927/6/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

Line 484:   static const int kTimeoutMs = 1 * 60 * 1000;
nit: how about const MonoDelta kTimeout = MonoDelta::FromSeconds(60)


Line 488:   cluster_opts_.master_rpc_ports = { 37030, 37031, 37032 };
huh. do we have to pick these manually?


Line 540:   ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms", "500"));
It's not obvious to me why a 500ms negotiation delay is sufficient to cause a timeout in this test


Line 542:       SleepFor(MonoDelta::FromSeconds(3));
What's the purpose of this sleep? Is it so this executes after ListTables() is executed? Is that required for this test to pass? It seems like we should be pausing the LeaderMaster then doing the ListTables immediately.


Line 548:       ScopedResumeExternalDaemon resumer(m);
CHECK_OK(m->Resume()); ?


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 8: Verified+1

Flakes:

1. Port conflict while running ClientFailoverOnNegotiationTimeoutITest.Kudu2021NegotiateWithMaster

2. Absolutely unrelated python symbols error.

-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 8:

(4 comments)

Would you mind running 1000 iters on this test on dist-test just to make sure it's not flaky?

http://gerrit.cloudera.org:8080/#/c/6927/8/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

PS8, Line 539:  
start the negotiation process


PS8, Line 540: k cl
the client's


PS8, Line 544:  ne
the negotiation process


PS8, Line 553: FLAGS_rpc_negotiation_timeout_ms
Is this reliable? Should it be the rpc timeout * 2 instead just to make sure?

Also, it looks like the default rpc negotiation timeout is 3 sec; should we shorten it to make this test complete faster? If we do that, we would need to set it that way in both the client environment and on the cluster.


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2021 retry master RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6927

to look at the new patch set (#3).

Change subject: KUDU-2021 retry master RPC if negotiation times out
......................................................................

KUDU-2021 retry master RPC if negotiation times out

This patch adds integration tests to verify that the Kudu C++ client
behaves as described in KUDU-2021: in case of a multi-master Kudu
cluster, the client retries an RPC with other master if the connection
negotiation with leader master times out.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/integration-tests/client_failover-itest.cc
1 file changed, 80 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 8:

(4 comments)

> (4 comments)
 > 
 > Would you mind running 1000 iters on this test on dist-test just to
 > make sure it's not flaky?

I did that already several times and did that once more for the DEBUG build:
  http://dist-test.cloudera.org//job?job_id=aserbin.1495742228.16502

The failures you can see there are attributed only to the port bind errors (e.g. something like 'Address already in use (error 98)')

I think your idea to separate the set of multi-master tests into a separate test binary and running it with appropriate RESOURCE_LOCK is a good idea.

http://gerrit.cloudera.org:8080/#/c/6927/8/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

PS8, Line 539:  
> start the negotiation process
Done


PS8, Line 540: k cl
> the client's
Done


PS8, Line 544:  ne
> the negotiation process
Done


PS8, Line 553: FLAGS_rpc_negotiation_timeout_ms
> Is this reliable? Should it be the rpc timeout * 2 instead just to make sur
It's a good point.

I think it is reliable -- that's because there are some extra steps that takes some time at the server side, so the total is usually tens of ms more than the injected delay at least in DEBUG builds.  However, I think it's better to bump it a little as suggested to increase trust in this test and to be on the safe side :)

As for shortening the default RPC negotation timeout for the client: it's already so. Check the constructor of the ClientFailoverOnNegotiationTimeoutITest class -- it's set the re to just 1 second.

I'm not sure whether we want to have that for the server-side components.  My thinking is the following: the only affected component here would be the leader master, and the Raft-related timeouts seem to be shorter anyway in that regard.  Let me know if you think I'm missing something.


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 10: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6927/10/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

Line 257:       SleepFor(MonoDelta::FromMilliseconds(FLAGS_rpc_negotiation_timeout_ms));
can we add a few ms to this sleep? say 50ms


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 8:

BTW the port conflict was what I was worried about with choosing the ports manually. We don't want that to cause flakiness on parallel tests.

I know I suggested client_failover-itest but in retrospect I think we should create a brand new test file for this (perhaps client_master_failover-itest) and add it to CMakeLists.txt with RESOURCE_LOCK "master-rpc-ports" added to it.

-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2021 retry master RPC if negotiation times out

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6927

to look at the new patch set (#2).

Change subject: KUDU-2021 retry master RPC if negotiation times out
......................................................................

KUDU-2021 retry master RPC if negotiation times out

This patch fixes KUDU-2021, i.e. with this patch, in case of a
multi-master Kudu cluster, the Kudu C++ client retries an RPC with other
master if the connection negotiation with leader master timed out.

Added new integration test to cover the updated client's behavior.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/master_migration-itest.cc
3 files changed, 110 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 11: Code-Review+2

carrying Mike's +2 from PS10

-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6927

to look at the new patch set (#10).

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................

KUDU-2021 test for negotiation timeout on Master RPCs

This patch adds integration tests to verify that the Kudu C++ client
behaves as described in KUDU-2021: in case of a multi-master Kudu
cluster, the client retries an RPC with other master if the connection
negotiation with leader master times out.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/client-negotiation-failover-itest.cc
M src/kudu/integration-tests/client_failover-itest.cc
3 files changed, 276 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2021 test for negotiation timeout on Master RPCs

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 10: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6927/10/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

Line 257:       SleepFor(MonoDelta::FromMilliseconds(FLAGS_rpc_negotiation_timeout_ms));
> We could, but what is the idea?
oops, you are right. Nevermind.


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes