You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2018/07/06 23:20:06 UTC

[kudu-CR] [raft conensus-itest] fix TestElectionMetrics flake

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10887


Change subject: [raft_conensus-itest] fix TestElectionMetrics flake
......................................................................

[raft_conensus-itest] fix TestElectionMetrics flake

This patch fixes flakiness in the RaftConsensusITest.TestElectionMetrics
scenario.  The original TestElectionMetrics scenario was split into
two parts TestElectionMetricsPart[1,2].

Prior to this patch, the TestElectionMetrics scenario could fail if a
leader election happen inadvertently.

Before (1 out of 12 failed):
  http://dist-test.cloudera.org/job?job_id=aserbin.1530681412.7635

After  (not a single failure in 1K run):
  http://dist-test.cloudera.org/job?job_id=aserbin.1530918429.120803

Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
2 files changed, 122 insertions(+), 27 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/10887/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10887 )

Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................

[raft_consensus-itest] fix TestElectionMetrics flake

This patch fixes flakiness in the RaftConsensusITest.TestElectionMetrics
scenario.  The original TestElectionMetrics scenario was split into
two parts TestElectionMetricsPart[1,2] to separate tests that assume
no leader election happen and depend on the leader failure detection
mechanism, correspondingly.

Prior to this patch, the TestElectionMetrics scenario could fail if a
leader election happened inadvertently.

Before (1 out of 12 failed):
  http://dist-test.cloudera.org/job?job_id=aserbin.1530681412.7635

After  (not a single failure in 1K run):
  http://dist-test.cloudera.org/job?job_id=aserbin.1530918429.120803

Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Reviewed-on: http://gerrit.cloudera.org:8080/10887
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Will Berkeley <wd...@gmail.com>
Reviewed-by: Attila Bukor <ab...@cloudera.com>
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
2 files changed, 121 insertions(+), 27 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Will Berkeley: Looks good to me, approved
  Attila Bukor: Looks good to me, but someone else must approve

-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )

Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@7
PS1, Line 7: conensus
> consensus
Done


http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@10
PS1, Line 10: The original TestElectionMetrics scenario was split into
            : two parts TestElectionMetricsPart[1,2]
> Why?
I added an explanation.


http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@14
PS1, Line 14: happen
> happened
Done


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1321
PS1, Line 1321: Resetting
> Reset
Done


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1321
PS1, Line 1321: since
              :     // that's the
> now that we've accepted an
Done


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1323
PS1, Line 1323: it might be
              :     // a race between setting the leader UUID in the consensus meta and
              :     // VOTE_DENIED response sent for a vote request sent earlier. If the
              :     // VOTE_DENIED response processed _after_ the call to UpdateReplica(),
              :     // the above-mentioned 'failed_elections_since_stable_leader' metric would
              :     // stuck with non-zero value if not resetting it here as well.
> I'd rephrase this a bit to "because there is a potential race between reset
That sounds clearer, thanks.


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2673
PS1, Line 2673: the failed leader detection
> leader failure detection
Done


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2674
PS1, Line 2674: that's to avoid
> this avoids
Done


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2674
PS1, Line 2674: be racing
> race
Done


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2675
PS1, Line 2675: assertion of appropriate constraints on
> tests of
Done


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2676
PS1, Line 2676: Part1
> TestElectionMetrics_FailureDetectionDisabled perhaps?
Renamed into TestElectionMetricsFailureDetectionDisabled


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2745
PS1, Line 2745: the failed leader detection
> leader failure detection
Done


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2746
PS1, Line 2746: Part2
> TestElectionMetrics_FailureDetectionEnabled perhaps?
Renamed into TestElectionMetricsFailureDetectionEnabled


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2796
PS1, Line 2796: Start back
> Restart
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 23:42:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/10887 )

Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )

Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................


Patch Set 2: Verified+1

Unrelated flake in the RaftConsensusParamReplicationModesITest.Test_KUDU_1735/1 scenario.  Probably, that's a good candidate to fix next time.


-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 07 Jul 2018 00:07:16 +0000
Gerrit-HasComments: No

[kudu-CR] [raft conensus-itest] fix TestElectionMetrics flake

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )

Change subject: [raft_conensus-itest] fix TestElectionMetrics flake
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@7
PS1, Line 7: conensus
consensus


http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@10
PS1, Line 10: The original TestElectionMetrics scenario was split into
            : two parts TestElectionMetricsPart[1,2]
Why?


http://gerrit.cloudera.org:8080/#/c/10887/1//COMMIT_MSG@14
PS1, Line 14: happen
happened


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1321
PS1, Line 1321: Resetting
Reset


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1321
PS1, Line 1321: since
              :     // that's the
now that we've accepted an


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/consensus/raft_consensus.cc@1323
PS1, Line 1323: it might be
              :     // a race between setting the leader UUID in the consensus meta and
              :     // VOTE_DENIED response sent for a vote request sent earlier. If the
              :     // VOTE_DENIED response processed _after_ the call to UpdateReplica(),
              :     // the above-mentioned 'failed_elections_since_stable_leader' metric would
              :     // stuck with non-zero value if not resetting it here as well.
I'd rephrase this a bit to "because there is a potential race between resetting the failed elections count in SetLeaderUuidUnlocked() and incrementing after a failed election if another replica was elected leader in an election concurrent with the one called by this replica."


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2673
PS1, Line 2673: the failed leader detection
leader failure detection


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2674
PS1, Line 2674: that's to avoid
this avoids


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2674
PS1, Line 2674: be racing
race


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2675
PS1, Line 2675: assertion of appropriate constraints on
tests of


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2676
PS1, Line 2676: Part1
TestElectionMetrics_FailureDetectionDisabled perhaps?


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2745
PS1, Line 2745: the failed leader detection
leader failure detection


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2746
PS1, Line 2746: Part2
TestElectionMetrics_FailureDetectionEnabled perhaps?


http://gerrit.cloudera.org:8080/#/c/10887/1/src/kudu/integration-tests/raft_consensus-itest.cc@2796
PS1, Line 2796: Start back
Restart



-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 23:31:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )

Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10887/2/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10887/2/src/kudu/integration-tests/raft_consensus-itest.cc@2767
PS2, Line 2767: in the majority
You are actually checking all replicas, which I think is fine.



-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 11 Jul 2018 19:19:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10887 )

Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................


Patch Set 2: Code-Review+1

Thanks Alexey for fixing this, looks good to me. I finally had time to look at this today and noticed that I have an incoming review already fixing it :)


-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 11 Jul 2018 19:38:19 +0000
Gerrit-HasComments: No

[kudu-CR] [raft consensus-itest] fix TestElectionMetrics flake

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10887

to look at the new patch set (#2).

Change subject: [raft_consensus-itest] fix TestElectionMetrics flake
......................................................................

[raft_consensus-itest] fix TestElectionMetrics flake

This patch fixes flakiness in the RaftConsensusITest.TestElectionMetrics
scenario.  The original TestElectionMetrics scenario was split into
two parts TestElectionMetricsPart[1,2] to separate tests that assume
no leader election happen and depend on the leader failure detection
mechanism, correspondingly.

Prior to this patch, the TestElectionMetrics scenario could fail if a
leader election happened inadvertently.

Before (1 out of 12 failed):
  http://dist-test.cloudera.org/job?job_id=aserbin.1530681412.7635

After  (not a single failure in 1K run):
  http://dist-test.cloudera.org/job?job_id=aserbin.1530918429.120803

Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
2 files changed, 121 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/10887/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I073c9989a6d5d5dc1eb104120a89d38cfce2ac6e
Gerrit-Change-Number: 10887
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>