You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2018/07/06 23:20:06 UTC
[kudu-CR] [raft conensus-itest] fix TestElectionMetrics flake
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10887
Change subject: [raft_conensus-itest] fix TestElectionMetrics flake
......................................................................
[raft_conensus-itest] fix TestElectionMetrics flake
This patch fixes flakiness in the RaftConsensusITest.TestElectionMetrics
scenario. The original TestElectionMetrics scenario was split into
two parts TestElectionMetricsPart[1,2].
Prior to this patch, the TestElectionMetrics scenario could fail if a
leader election happen inadvertently.
Before (1 out of 12 failed):
http://dist-test.cloudera.org/job?job_id=aserbin.1530681412.7635
After (not a single failure in 1K run):
http://dist-test.cloudera.org/job?job_id=aserbin.1530918429.120803
Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
2 files changed, 122 insertions(+), 27 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/10887/1
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10887 )
Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................
[raft_consensus-itest] fix TestElectionMetrics flake
This patch fixes flakiness in the RaftConsensusITest.TestElectionMetrics
scenario. The original TestElectionMetrics scenario was split into
two parts TestElectionMetricsPart[1,2] to separate tests that assume
no leader election happen and depend on the leader failure detection
mechanism, correspondingly.
Prior to this patch, the TestElectionMetrics scenario could fail if a
leader election happened inadvertently.
Before (1 out of 12 failed):
http://dist-test.cloudera.org/job?job_id=aserbin.1530681412.7635
After (not a single failure in 1K run):
http://dist-test.cloudera.org/job?job_id=aserbin.1530918429.120803
Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Reviewed-on: http://gerrit.cloudera.org:8080/10887
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Will Berkeley <wd...@gmail.com>
Reviewed-by: Attila Bukor <ab...@cloudera.com>
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
2 files changed, 121 insertions(+), 27 deletions(-)
Approvals:
Alexey Serbin: Verified
Will Berkeley: Looks good to me, approved
Attila Bukor: Looks good to me, but someone else must approve
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )
Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................
Patch Set 1:
(14 comments)
http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@7
PS1, Line 7: conensus
> consensus
Done
http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@10
PS1, Line 10: The original TestElectionMetrics scenario was split into
: two parts TestElectionMetricsPart[1,2]
> Why?
I added an explanation.
http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@14
PS1, Line 14: happen
> happened
Done
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1321
PS1, Line 1321: Resetting
> Reset
Done
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1321
PS1, Line 1321: since
: // that's the
> now that we've accepted an
Done
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1323
PS1, Line 1323: it might be
: // a race between setting the leader UUID in the consensus meta and
: // VOTE_DENIED response sent for a vote request sent earlier. If the
: // VOTE_DENIED response processed _after_ the call to UpdateReplica(),
: // the above-mentioned 'failed_elections_since_stable_leader' metric would
: // stuck with non-zero value if not resetting it here as well.
> I'd rephrase this a bit to "because there is a potential race between reset
That sounds clearer, thanks.
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2673
PS1, Line 2673: the failed leader detection
> leader failure detection
Done
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2674
PS1, Line 2674: that's to avoid
> this avoids
Done
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2674
PS1, Line 2674: be racing
> race
Done
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2675
PS1, Line 2675: assertion of appropriate constraints on
> tests of
Done
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2676
PS1, Line 2676: Part1
> TestElectionMetrics_FailureDetectionDisabled perhaps?
Renamed into TestElectionMetricsFailureDetectionDisabled
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2745
PS1, Line 2745: the failed leader detection
> leader failure detection
Done
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2746
PS1, Line 2746: Part2
> TestElectionMetrics_FailureDetectionEnabled perhaps?
Renamed into TestElectionMetricsFailureDetectionEnabled
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2796
PS1, Line 2796: Start back
> Restart
Done
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 23:42:06 +0000
Gerrit-HasComments: Yes
[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/10887 )
Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................
Removed reviewer Kudu Jenkins with the following votes:
* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )
Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................
Patch Set 2: Verified+1
Unrelated flake in the RaftConsensusParamReplicationModesITest.Test_KUDU_1735/1 scenario. Probably, that's a good candidate to fix next time.
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 07 Jul 2018 00:07:16 +0000
Gerrit-HasComments: No
[kudu-CR] [raft conensus-itest] fix TestElectionMetrics flake
Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )
Change subject: [raft_conensus-itest] fix TestElectionMetrics flake
......................................................................
Patch Set 1:
(14 comments)
http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@7
PS1, Line 7: conensus
consensus
http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@10
PS1, Line 10: The original TestElectionMetrics scenario was split into
: two parts TestElectionMetricsPart[1,2]
Why?
http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@14
PS1, Line 14: happen
happened
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1321
PS1, Line 1321: Resetting
Reset
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1321
PS1, Line 1321: since
: // that's the
now that we've accepted an
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1323
PS1, Line 1323: it might be
: // a race between setting the leader UUID in the consensus meta and
: // VOTE_DENIED response sent for a vote request sent earlier. If the
: // VOTE_DENIED response processed _after_ the call to UpdateReplica(),
: // the above-mentioned 'failed_elections_since_stable_leader' metric would
: // stuck with non-zero value if not resetting it here as well.
I'd rephrase this a bit to "because there is a potential race between resetting the failed elections count in SetLeaderUuidUnlocked() and incrementing after a failed election if another replica was elected leader in an election concurrent with the one called by this replica."
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2673
PS1, Line 2673: the failed leader detection
leader failure detection
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2674
PS1, Line 2674: that's to avoid
this avoids
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2674
PS1, Line 2674: be racing
race
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2675
PS1, Line 2675: assertion of appropriate constraints on
tests of
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2676
PS1, Line 2676: Part1
TestElectionMetrics_FailureDetectionDisabled perhaps?
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2745
PS1, Line 2745: the failed leader detection
leader failure detection
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2746
PS1, Line 2746: Part2
TestElectionMetrics_FailureDetectionEnabled perhaps?
http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2796
PS1, Line 2796: Start back
Restart
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 23:31:48 +0000
Gerrit-HasComments: Yes
[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake
Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )
Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................
Patch Set 2: Code-Review+2
(1 comment)
http://gerrit.cloudera.org:8080/#/c/10887/2/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:
http://gerrit.cloudera.org:8080/#/c/10887/2/src/kudu/integration-tests/raft_consensus-itest.cc@2767
PS2, Line 2767: in the majority
You are actually checking all replicas, which I think is fine.
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 11 Jul 2018 19:19:49 +0000
Gerrit-HasComments: Yes
[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake
Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )
Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................
Patch Set 2: Code-Review+1
Thanks Alexey for fixing this, looks good to me. I finally had time to look at this today and noticed that I have an incoming review already fixing it :)
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 11 Jul 2018 19:38:19 +0000
Gerrit-HasComments: No
[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/10887
to look at the new patch set (#2).
Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................
[raft_consensus-itest] fix TestElectionMetrics flake
This patch fixes flakiness in the RaftConsensusITest.TestElectionMetrics
scenario. The original TestElectionMetrics scenario was split into
two parts TestElectionMetricsPart[1,2] to separate tests that assume
no leader election happen and depend on the leader failure detection
mechanism, correspondingly.
Prior to this patch, the TestElectionMetrics scenario could fail if a
leader election happened inadvertently.
Before (1 out of 12 failed):
http://dist-test.cloudera.org/job?job_id=aserbin.1530681412.7635
After (not a single failure in 1K run):
http://dist-test.cloudera.org/job?job_id=aserbin.1530918429.120803
Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
2 files changed, 121 insertions(+), 27 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/10887/2
--
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>