You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/09/25 18:34:50 UTC

[kudu-CR] [tools] Fix bug in CheckCompleteMove

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11508


Change subject: [tools] Fix bug in CheckCompleteMove
......................................................................

[tools] Fix bug in CheckCompleteMove

It was possible for the following sequence to happen:

0. We are moving a replica from TS X to TS Y for tablet A. TS X is
   presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
   we gather consensus state using proxy, thinking we are talking to Z,
   but the proxy is pointed at X, causing a bad status like

   Invalid argument: GetConsensusState: Wrong destination UUID requested. Local UUID: X. Requested UUID: Z

This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.

This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.

Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
---
M src/kudu/tools/tool_replica_util.cc
1 file changed, 69 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/11508/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] Fix bug in CheckCompleteMove

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11508

to look at the new patch set (#2).

Change subject: [tools] Fix bug in CheckCompleteMove
......................................................................

[tools] Fix bug in CheckCompleteMove

It was possible for the following sequence to happen:

0. We are moving a replica from TS X to TS Y for tablet A. TS X is
   presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
   we gather consensus state using proxy, thinking we are talking to Z,
   but the proxy is pointed at X, causing a bad status like

   Invalid argument: GetConsensusState: Wrong destination UUID requested. Local UUID: X. Requested UUID: Z

This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.

This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.

Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
---
M src/kudu/tools/tool_replica_util.cc
1 file changed, 69 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/11508/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] [tools] Fix bug in CheckCompleteMove

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11508 )

Change subject: [tools] Fix bug in CheckCompleteMove
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@287
PS1, Line 287: nd the
             :   // replica-to-be-removed has stepped down as leader
That's not the only case we need to handle here right?  The replica-to-be-removed might be a follower replica at this point, no?


http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@291
PS1, Line 291: to_ts_uuid_is_a_voter
I think additional !from_ts_uuid_in_config condition is missing here.


If from the procedural perspective there is nothing to do in case of the 3-4-3 replica management scheme, this function should still report on successful completion only when the the source replica is gone from the tablet configuration.



-- 
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Tue, 25 Sep 2018 21:29:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] Fix bug in CheckCompleteMove

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11508 )

Change subject: [tools] Fix bug in CheckCompleteMove
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 01 Oct 2018 18:49:51 +0000
Gerrit-HasComments: No

[kudu-CR] [tools] Fix bug in CheckCompleteMove

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11508 )

Change subject: [tools] Fix bug in CheckCompleteMove
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@287
PS1, Line 287: nd the
             :   // replica-to-be-removed has stepped down as leader
> That's not the only case we need to handle here right?  The replica-to-be-r
Done


http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@291
PS1, Line 291: to_ts_uuid_is_a_voter
> I think additional !from_ts_uuid_in_config condition is missing here.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 01 Oct 2018 18:34:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] Fix bug in CheckCompleteMove

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11508 )

Change subject: [tools] Fix bug in CheckCompleteMove
......................................................................

[tools] Fix bug in CheckCompleteMove

It was possible for the following sequence to happen:

0. We are moving a replica from TS X to TS Y for tablet A. TS X is
   presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
   we gather consensus state using proxy, thinking we are talking to Z,
   but the proxy is pointed at X, causing a bad status like

   Invalid argument: GetConsensusState: Wrong destination UUID requested. Local UUID: X. Requested UUID: Z

This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.

This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.

Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Reviewed-on: http://gerrit.cloudera.org:8080/11508
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/tools/tool_replica_util.cc
1 file changed, 69 insertions(+), 37 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>