You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "honeyhexin (Code Review)" <ge...@cloudera.org> on 2019/08/12 16:33:19 UTC

[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI

honeyhexin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14048


Change subject: KUDU-2915 Support to delete dead tservers from CLI
......................................................................

KUDU-2915 Support to delete dead tservers from CLI

The patch provides a command that can delete all dead
tablet servers from master. The command can be used as:
  kudu master delete_dead_tservers <master_addresses>

Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/master_proxy_rpc.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/master.cc
M src/kudu/master/master.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/master/ts_manager.cc
M src/kudu/master/ts_manager.h
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_master.cc
14 files changed, 177 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/14048/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin <ho...@sohu.com>

[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI

Posted by "honeyhexin (Code Review)" <ge...@cloudera.org>.
honeyhexin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14048 )

Change subject: KUDU-2915 Support to delete dead tservers from CLI
......................................................................


Patch Set 1:

I agree that introducing `kudu cluster decommission` tool will be more reasonalbe. I will change some logic and resubmit the patch soon.


-- 
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin <ho...@sohu.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: honeyhexin <ho...@sohu.com>
Gerrit-Comment-Date: Mon, 12 Aug 2019 23:17:45 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14048 )

Change subject: KUDU-2915 Support to delete dead tservers from CLI
......................................................................


Patch Set 1:

(2 comments)

> Patch Set 1:
> 
> Have we considered alternatives here? eg if the cluster is fully recovered from an outage, maybe we shouldn't have KSCK show a downed tablet server as a "bad" status?
> 
> Another option that might be a bit better is to allow an option to forget a _specific_ dead TS -- otherwise it seems a bit heavy-handed to forget _all_ dead tablet servers.

Seems I misunderstood the original intent of KUDU-2915. I thought it was the second option that Todd mentioned, i.e. a way to forget about a single dead tablet server. See here https://docs.google.com/document/d/12BZqspGjHvQlc-o8XTDixoRol9Q36WJzXLJ6p15Zhf0/edit#heading=h.f7wecdcqsbe for my thoughts on how I think that would be useful in the context of tserver decommissioning.

I also chatted with Adar about this in person, and one thing worth considering is that this is basically the last step of a decommissioning, where the full steps for decommissioning would look something like:

1. Mark a tablet server as being decommissioning to avoid replica placement onto that tablet server.
2. Move all replicas away from the tablet server.
3. Once empty, (either automatically or with a tool) indicate that the tablet server has been successfully decommissioned by having the master forget about the decommissioned tablet server.

If we want to build towards this without introducing redundant tooling, instead of having a `kudu master delete_tserver` tool and then eventually introducing `kudu cluster decommission` tool, we could introduce the "delete single tserver" functionality as `kudu cluster decommission` which would, for now, only do #3. #1 and #2 should be left as future work to build on top of this tool. What do you think?

http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc@638
PS1, Line 638: void MasterServiceImpl::DeleteDeadTServerInMaster(const DeleteDeadTabletServersRequestPB* req,
The registration of a tablet server isn't persisted to disk. Is it important to take the leader lock when performing this action?


http://gerrit.cloudera.org:8080/#/c/14048/1/src/kudu/master/master_service.cc@650
PS1, Line 650: void MasterServiceImpl::DeleteDeadTabletServers(const DeleteDeadTabletServersRequestPB* req,
I don't see the advantage of having the master relay these RPCs to each other, versus having the tool instantiate multiple master proxies and sending the requests to each master individually. Does "forgetting" a tablet server _need_ to go to the leader first?



-- 
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin <ho...@sohu.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 12 Aug 2019 18:53:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2915 Support to delete dead tservers from CLI

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14048 )

Change subject: KUDU-2915 Support to delete dead tservers from CLI
......................................................................


Patch Set 1:

Have we considered alternatives here? eg if the cluster is fully recovered from an outage, maybe we shouldn't have KSCK show a downed tablet server as a "bad" status?

Another option that might be a bit better is to allow an option to forget a _specific_ dead TS -- otherwise it seems a bit heavy-handed to forget _all_ dead tablet servers.


-- 
To view, visit http://gerrit.cloudera.org:8080/14048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I689cfb02a1ae44c4d941e83e3a8cf6e14c7911c7
Gerrit-Change-Number: 14048
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin <ho...@sohu.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 12 Aug 2019 17:34:46 +0000
Gerrit-HasComments: No