You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2019/12/27 06:31:13 UTC

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14953


Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................

[tests] address flakiness in raft_consensus_election-itest

Few test scenarios of the raft_consensus_election-itest suite
involving churny elections were showing flakiness when run in slow
mode with --stress_cpu_threads=16.  The common root of the problem
was failing writer test thread due to timeout.

This patch addresses the issue, increasing Raft heartbeat interval
from 1 to 2 milliseconds.  With this change, the above mentioned
tests become more stable, no longer failing due to the timeout error.
I ran the raft_consensus_election-itest built in DEBUG mode multiple
1K batches to confirm that.

Even with this patch, the above mentioned test scenarios sometimes fail
due to the DCHECK_GE assert in PeerMessageQueue::CheckMonotonicTerms().
The latter issues will be addressed separately.

Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
3 files changed, 9 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/14953/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14953

to look at the new patch set (#2).

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................

[tests] address flakiness in raft_consensus_election-itest

Few test scenarios of the raft_consensus_election-itest suite
involving churny elections were showing flakiness when run in slow
mode with --stress_cpu_threads=16.  The common root of the problem
was failing writer test thread due to timeout.

This patch addresses the issue, increasing Raft heartbeat interval
from 1 to 2 milliseconds.  With this change, the above mentioned
tests become more stable, no longer failing due to the timeout error.
I ran the raft_consensus_election-itest built in DEBUG mode multiple
1K batches to confirm that.

Even with this patch, the above mentioned test scenarios sometimes fail
due to the DCHECK_GE assert in PeerMessageQueue::CheckMonotonicTerms().
The latter issues will be addressed separately.

Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
3 files changed, 18 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/14953/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14953 )

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 06 Jan 2020 21:11:15 +0000
Gerrit-HasComments: No

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14953 )

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................

[tests] address flakiness in raft_consensus_election-itest

Few test scenarios of the raft_consensus_election-itest suite
involving churny elections were showing flakiness when run in slow
mode with --stress_cpu_threads=16.  The common root of the problem
was failing writer test thread due to timeout.

This patch addresses the issue, increasing Raft heartbeat interval
from 1 to 2 milliseconds.  With this change, the above mentioned
tests become more stable, no longer failing due to the timeout error.
I ran the raft_consensus_election-itest built in DEBUG mode multiple
1K batches to confirm that.

Even with this patch, the above mentioned test scenarios sometimes fail
due to the DCHECK_GE assert in PeerMessageQueue::CheckMonotonicTerms().
The latter issues will be addressed separately.

Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Reviewed-on: http://gerrit.cloudera.org:8080/14953
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
3 files changed, 19 insertions(+), 20 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14953

to look at the new patch set (#4).

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................

[tests] address flakiness in raft_consensus_election-itest

Few test scenarios of the raft_consensus_election-itest suite
involving churny elections were showing flakiness when run in slow
mode with --stress_cpu_threads=16.  The common root of the problem
was failing writer test thread due to timeout.

This patch addresses the issue, increasing Raft heartbeat interval
from 1 to 2 milliseconds.  With this change, the above mentioned
tests become more stable, no longer failing due to the timeout error.
I ran the raft_consensus_election-itest built in DEBUG mode multiple
1K batches to confirm that.

Even with this patch, the above mentioned test scenarios sometimes fail
due to the DCHECK_GE assert in PeerMessageQueue::CheckMonotonicTerms().
The latter issues will be addressed separately.

Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
3 files changed, 19 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/14953/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14953 )

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14953/1/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14953/1/src/kudu/integration-tests/raft_consensus_election-itest.cc@129
PS1, Line 129:   workload->set_write_timeout_millis((AllowSlowTests() ? 120 : 60) * 1000);
Hmm, why condition this on slow tests and not something like build type? AFAICT max_rows_to_insert is the only other factor that changes in slow tests; is that directly correlated with these timeouts?



-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 29 Dec 2019 18:20:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14953

to look at the new patch set (#3).

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................

[tests] address flakiness in raft_consensus_election-itest

Few test scenarios of the raft_consensus_election-itest suite
involving churny elections were showing flakiness when run in slow
mode with --stress_cpu_threads=16.  The common root of the problem
was failing writer test thread due to timeout.

This patch addresses the issue, increasing Raft heartbeat interval
from 1 to 2 milliseconds.  With this change, the above mentioned
tests become more stable, no longer failing due to the timeout error.
I ran the raft_consensus_election-itest built in DEBUG mode multiple
1K batches to confirm that.

Even with this patch, the above mentioned test scenarios sometimes fail
due to the DCHECK_GE assert in PeerMessageQueue::CheckMonotonicTerms().
The latter issues will be addressed separately.

Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
3 files changed, 20 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/14953/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14953 )

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14953/1/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14953/1/src/kudu/integration-tests/raft_consensus_election-itest.cc@129
PS1, Line 129:   workload->set_write_timeout_millis((AllowSlowTests() ? 120 : 60) * 1000);
> Hmm, why condition this on slow tests and not something like build type? AF
Yes, I agree -- this looks strange.  Actually, this mirrors the logic of the call sites passing 'max_rows_to_insert': that depends on AllowSlowTests().  I think I'll better change the signature of the DoTestChurnyElections() function to make it more consistent.



-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 30 Dec 2019 08:17:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14953 )

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14953/3/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14953/3/src/kudu/integration-tests/raft_consensus_election-itest.cc@131
PS3, Line 131:   //workload->set_write_timeout_millis((AllowSlowTests() ? 120 : 60) * 1000);
> Remove this commented out line?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 06 Jan 2020 18:47:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14953 )

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14953/2/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14953/2/src/kudu/integration-tests/raft_consensus_election-itest.cc@217
PS2, Line 217: TEST_F(RaftConsensusElectionITest, ChurnyElections_WithNotificationLatency) {
> warning: avoid using "_" in test name "ChurnyElections_WithNotificationLate
Done


http://gerrit.cloudera.org:8080/#/c/14953/2/src/kudu/integration-tests/raft_consensus_election-itest.cc@230
PS2, Line 230: TEST_F(RaftConsensusElectionITest, ChurnyElections_WithDuplicateKeys) {
> warning: avoid using "_" in test name "ChurnyElections_WithDuplicateKeys" a
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 30 Dec 2019 16:52:57 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tests] address flakiness in raft consensus election-itest

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14953 )

Change subject: [tests] address flakiness in raft_consensus_election-itest
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14953/3/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14953/3/src/kudu/integration-tests/raft_consensus_election-itest.cc@131
PS3, Line 131:   //workload->set_write_timeout_millis((AllowSlowTests() ? 120 : 60) * 1000);
Remove this commented out line?



-- 
To view, visit http://gerrit.cloudera.org:8080/14953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6f54643c9c066b31a74e1082260225e60324e4e
Gerrit-Change-Number: 14953
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 30 Dec 2019 18:41:59 +0000
Gerrit-HasComments: Yes