You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/09/20 08:18:35 UTC

[kudu-CR] WIP: consensus: refactor tracking of received OpIds out of ReplicaState

Hello David Ribeiro Alves, Mike Percy,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/4476

to review the following change.

Change subject: WIP: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................

WIP: consensus: refactor tracking of received OpIds out of ReplicaState

The PeerMessageQueue class was already tracking the last appended OpId, so
tracking it in ReplicaState was redundant and confusing. This removes a
bunch of stuff from ReplicaState and adds just a little bit of new
functionality to PeerMessageQueue:

- TruncateOpsAfter() now takes an index instead of an OpId. The queue
  can already map the index to an OpId by asking the log.
- Added a getter to expose the last OpId in the log back to RaftConsensus
- Changed OpId generation to happen in PeerMessageQueue. This was easy
  because it already knows the previous OpId and the current term.

The 'last_received_cur_leader' tracking was moved into RaftConsensus
itself, since it's just transient state tracking the RPC back-and-forths
between a leader and the follower.

This patch also removes raft_consensus-test, the mock-based testing for
RaftConsensus. I found that maintaining this test was very difficult, in
particular because now we rely on the fact that AppendOperations() is
reflected in GetLastOpIdInLog(). With a mock PeerMessageQueue, this
state update wasn't happening properly, and trying to reproduce that
behavior in the mocks themselves seemed like I was basically
re-implementing the actual production code for the queue. I looked over
the tests in this suite and I believe all of the cases are covered by
various other tests (randomized and otherwise).

I looped raft_consensus-itest 100 times[1], the Churny test case 1000
times[2], and exactly_once_writes-itest 1000 times[3]. Lastly, I was
able to re-enable TestChurnyElections_WithNotificationLatency and loop
it 500 times[4].

[1] http://dist-test.cloudera.org/job?job_id=todd.1474357631.30024
[2] http://dist-test.cloudera.org//job?job_id=todd.1474359004.2328
[3] http://dist-test.cloudera.org//job?job_id=todd.1474358436.31536
[4] http://dist-test.cloudera.org//job?job_id=todd.1474359250.4834

Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
D src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
10 files changed, 52 insertions(+), 795 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/4476/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4476/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 565:     // TODO(todd) count of number of ops failed due to consensus queue overflow.
> Yeah, I think I was one of the ones to vote to keep them, only realized tid
I could change the bot to only comment on changed lines (rather than changed lines + context) but I think it often makes sense to look for issues within a few lines of the change, since for example changing a method body might make one of the arguments unused or whatever.

How about a mass automated change that will make all of the current TODO: into TODO(author) where author is determined by git-blame?


-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4476/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 565:     // TODO(todd) count of number of ops failed due to consensus queue overflow.
> I could change the bot to only comment on changed lines (rather than change
I like the mass change idea, would be curious to look at the output to see it was mostly accurate though.


-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4476/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 565:     // TODO(todd) count of number of ops failed due to consensus queue overflow.
> side note: these warnings are helpful but problematic, newcomers to a code 
There was a mailing list thread about this a couple weeks ago and people seemed to be in favor of keeping them. My policy is just to use git blame to figure out who wrote the TODO and put their name there :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 1:

Would appreciate a review on this general direction before taking the time to finish the patch. Does this refactor seem positive to others?

-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 1:

I like the consolidation and, as I had said on the mailing list, I'm +1 on the removal of the test, so, in general, I think this patch is a good idea.

-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4476

to look at the new patch set (#2).

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................

consensus: refactor tracking of received OpIds out of ReplicaState

The PeerMessageQueue class was already tracking the last appended OpId, so
tracking it in ReplicaState was redundant and confusing. This removes a
bunch of stuff from ReplicaState and adds just a little bit of new
functionality to PeerMessageQueue:

- TruncateOpsAfter() now takes an index instead of an OpId. The queue
  can already map the index to an OpId by asking the log.
- Added a getter to expose the last OpId in the log back to RaftConsensus
- Changed OpId generation to happen in PeerMessageQueue. This was easy
  because it already knows the previous OpId and the current term.

The 'last_received_cur_leader' tracking was moved into RaftConsensus
itself, since it's just transient state tracking the RPC back-and-forths
between a leader and the follower.

This patch also removes raft_consensus-test, the mock-based testing for
RaftConsensus. I found that maintaining this test was very difficult, in
particular because now we rely on the fact that AppendOperations() is
reflected in GetLastOpIdInLog(). With a mock PeerMessageQueue, this
state update wasn't happening properly, and trying to reproduce that
behavior in the mocks themselves seemed like I was basically
re-implementing the actual production code for the queue. I looked over
the tests in this suite and I believe all of the cases are covered by
various other tests (randomized and otherwise).

I looped raft_consensus-itest 100 times[1], the Churny test case 1000
times[2], and exactly_once_writes-itest 1000 times[3]. Lastly, I was
able to re-enable TestChurnyElections_WithNotificationLatency and loop
it 500 times[4].

[1] http://dist-test.cloudera.org/job?job_id=todd.1474357631.30024
[2] http://dist-test.cloudera.org//job?job_id=todd.1474359004.2328
[3] http://dist-test.cloudera.org//job?job_id=todd.1474358436.31536
[4] http://dist-test.cloudera.org//job?job_id=todd.1474359250.4834

Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
D src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
10 files changed, 57 insertions(+), 801 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/4476/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 4: Code-Review+2

Just rebased

-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4476/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 565:     // TODO(todd) count of number of ops failed due to consensus queue overflow.
> There was a mailing list thread about this a couple weeks ago and people se
Yeah, I think I was one of the ones to vote to keep them, only realized tidy bot would make the suggestions on unchanged code later


-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: WIP: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 1: Code-Review+1

(3 comments)

looks like a good change

http://gerrit.cloudera.org:8080/#/c/4476/1/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

PS1, Line 193: queue
...assuming the term remains the same


http://gerrit.cloudera.org:8080/#/c/4476/1/src/kudu/consensus/raft_consensus_quorum-test.cc
File src/kudu/consensus/raft_consensus_quorum-test.cc:

Line 294:         if (OpIdCompare(peer->queue_->GetLastOpIdInLog(), to_wait_for) >= 0) {
I know this is just a test but it looks like there is no longer a need to take the replica state lock.


http://gerrit.cloudera.org:8080/#/c/4476/1/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS1, Line 656: ShortDebugString
nit: nicer to use OpIdToString()


-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


consensus: refactor tracking of received OpIds out of ReplicaState

The PeerMessageQueue class was already tracking the last appended OpId, so
tracking it in ReplicaState was redundant and confusing. This removes a
bunch of stuff from ReplicaState and adds just a little bit of new
functionality to PeerMessageQueue:

- TruncateOpsAfter() now takes an index instead of an OpId. The queue
  can already map the index to an OpId by asking the log.
- Added a getter to expose the last OpId in the log back to RaftConsensus
- Changed OpId generation to happen in PeerMessageQueue. This was easy
  because it already knows the previous OpId and the current term.

The 'last_received_cur_leader' tracking was moved into RaftConsensus
itself, since it's just transient state tracking the RPC back-and-forths
between a leader and the follower.

This patch also removes raft_consensus-test, the mock-based testing for
RaftConsensus. I found that maintaining this test was very difficult, in
particular because now we rely on the fact that AppendOperations() is
reflected in GetLastOpIdInLog(). With a mock PeerMessageQueue, this
state update wasn't happening properly, and trying to reproduce that
behavior in the mocks themselves seemed like I was basically
re-implementing the actual production code for the queue. I looked over
the tests in this suite and I believe all of the cases are covered by
various other tests (randomized and otherwise).

I looped raft_consensus-itest 100 times[1], the Churny test case 1000
times[2], and exactly_once_writes-itest 1000 times[3]. Lastly, I was
able to re-enable TestChurnyElections_WithNotificationLatency and loop
it 500 times[4].

[1] http://dist-test.cloudera.org/job?job_id=todd.1474357631.30024
[2] http://dist-test.cloudera.org//job?job_id=todd.1474359004.2328
[3] http://dist-test.cloudera.org//job?job_id=todd.1474358436.31536
[4] http://dist-test.cloudera.org//job?job_id=todd.1474359250.4834

Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Reviewed-on: http://gerrit.cloudera.org:8080/4476
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
D src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
10 files changed, 61 insertions(+), 801 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4476

to look at the new patch set (#4).

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................

consensus: refactor tracking of received OpIds out of ReplicaState

The PeerMessageQueue class was already tracking the last appended OpId, so
tracking it in ReplicaState was redundant and confusing. This removes a
bunch of stuff from ReplicaState and adds just a little bit of new
functionality to PeerMessageQueue:

- TruncateOpsAfter() now takes an index instead of an OpId. The queue
  can already map the index to an OpId by asking the log.
- Added a getter to expose the last OpId in the log back to RaftConsensus
- Changed OpId generation to happen in PeerMessageQueue. This was easy
  because it already knows the previous OpId and the current term.

The 'last_received_cur_leader' tracking was moved into RaftConsensus
itself, since it's just transient state tracking the RPC back-and-forths
between a leader and the follower.

This patch also removes raft_consensus-test, the mock-based testing for
RaftConsensus. I found that maintaining this test was very difficult, in
particular because now we rely on the fact that AppendOperations() is
reflected in GetLastOpIdInLog(). With a mock PeerMessageQueue, this
state update wasn't happening properly, and trying to reproduce that
behavior in the mocks themselves seemed like I was basically
re-implementing the actual production code for the queue. I looked over
the tests in this suite and I believe all of the cases are covered by
various other tests (randomized and otherwise).

I looped raft_consensus-itest 100 times[1], the Churny test case 1000
times[2], and exactly_once_writes-itest 1000 times[3]. Lastly, I was
able to re-enable TestChurnyElections_WithNotificationLatency and loop
it 500 times[4].

[1] http://dist-test.cloudera.org/job?job_id=todd.1474357631.30024
[2] http://dist-test.cloudera.org//job?job_id=todd.1474359004.2328
[3] http://dist-test.cloudera.org//job?job_id=todd.1474358436.31536
[4] http://dist-test.cloudera.org//job?job_id=todd.1474359250.4834

Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
D src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
10 files changed, 61 insertions(+), 801 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/4476/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4476/1/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

PS1, Line 193: queue
> ...assuming the term remains the same
Done


http://gerrit.cloudera.org:8080/#/c/4476/1/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

Line 466:   // TODO hack to serialize updates due to repeated/out-of-order messages
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/4476/1/src/kudu/consensus/raft_consensus_quorum-test.cc
File src/kudu/consensus/raft_consensus_quorum-test.cc:

Line 294:         if (OpIdCompare(peer->queue_->GetLastOpIdInLog(), to_wait_for) >= 0) {
> I know this is just a test but it looks like there is no longer a need to t
Done


http://gerrit.cloudera.org:8080/#/c/4476/1/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

PS1, Line 656: ShortDebugString
> nit: nicer to use OpIdToString()
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4476

to look at the new patch set (#3).

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................

consensus: refactor tracking of received OpIds out of ReplicaState

The PeerMessageQueue class was already tracking the last appended OpId, so
tracking it in ReplicaState was redundant and confusing. This removes a
bunch of stuff from ReplicaState and adds just a little bit of new
functionality to PeerMessageQueue:

- TruncateOpsAfter() now takes an index instead of an OpId. The queue
  can already map the index to an OpId by asking the log.
- Added a getter to expose the last OpId in the log back to RaftConsensus
- Changed OpId generation to happen in PeerMessageQueue. This was easy
  because it already knows the previous OpId and the current term.

The 'last_received_cur_leader' tracking was moved into RaftConsensus
itself, since it's just transient state tracking the RPC back-and-forths
between a leader and the follower.

This patch also removes raft_consensus-test, the mock-based testing for
RaftConsensus. I found that maintaining this test was very difficult, in
particular because now we rely on the fact that AppendOperations() is
reflected in GetLastOpIdInLog(). With a mock PeerMessageQueue, this
state update wasn't happening properly, and trying to reproduce that
behavior in the mocks themselves seemed like I was basically
re-implementing the actual production code for the queue. I looked over
the tests in this suite and I believe all of the cases are covered by
various other tests (randomized and otherwise).

I looped raft_consensus-itest 100 times[1], the Churny test case 1000
times[2], and exactly_once_writes-itest 1000 times[3]. Lastly, I was
able to re-enable TestChurnyElections_WithNotificationLatency and loop
it 500 times[4].

[1] http://dist-test.cloudera.org/job?job_id=todd.1474357631.30024
[2] http://dist-test.cloudera.org//job?job_id=todd.1474359004.2328
[3] http://dist-test.cloudera.org//job?job_id=todd.1474358436.31536
[4] http://dist-test.cloudera.org//job?job_id=todd.1474359250.4834

Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
---
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
D src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
10 files changed, 60 insertions(+), 801 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/4476/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] consensus: refactor tracking of received OpIds out of ReplicaState

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: consensus: refactor tracking of received OpIds out of ReplicaState
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4476/2/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 565:     // TODO(todd) count of number of ops failed due to consensus queue overflow.
side note: these warnings are helpful but problematic, newcomers to a code region have no idea who to attribute these to and tidy bot seems to like to point these out outside of the changed regions. Should we disable this warning?


http://gerrit.cloudera.org:8080/#/c/4476/2/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

Line 462:   OpId last_received_cur_leader_;
docs


-- 
To view, visit http://gerrit.cloudera.org:8080/4476
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81614d26328b0fbba37bf279f59717e05a07b816
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes