You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/02/22 09:39:31 UTC

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9393


Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet replace_tablet` that can be used
to replace a tablet with another having the same partition
information. Any data in the replaced tablet is lost. This tool
is useful when a tablet is has permanently lost all replicas
because it can repair the tablet's table. Before, the table would
have to be recreated to recover from a permanently lost tablet.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_tablet.cc
10 files changed, 312 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 7:

> Build Failed
 
Unrelated, pre-existing race. See KUDU-2058.


-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Mar 2018 21:45:28 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 19:06:56 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 10: Code-Review+1

I'd like Dan to take a look too.


-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Apr 2018 21:56:32 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc@4364
PS9, Line 4364:     new_tablet->metadata().ReadLock();
> Nit: use a new TabletMetadataLock for this.
Done


http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc@4371
PS9, Line 4371:   l_table.Commit();
> Actually, you don't need a TableMetadataLock at all; I misunderstood the co
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Apr 2018 21:31:32 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/integration-tests/replace_tablet-itest.cc
File src/kudu/integration-tests/replace_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/integration-tests/replace_tablet-itest.cc@55
PS6, Line 55:     ExternalMiniClusterITestBase(),
> warning: initializer for base class 'kudu::ExternalMiniClusterITestBase' is
Done


http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/integration-tests/replace_tablet-itest.cc@121
PS6, Line 121:   while (workload.rows_inserted() < 2 * kNumRows) {
> warning: either cast from 'int' to 'long' is ineffective, or there is loss 
Done


http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/tools/tool_action_common.cc@554
PS6, Line 554: LeaderMasterProxy::LeaderMasterProxy(const client::sp::shared_ptr<KuduClient>& client) :
> warning: pass by value and use std::move [modernize-pass-by-value]
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Mar 2018 16:38:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h@577
PS2, Line 577: , causing all of its data
             :   // to be permanently lost.
Would it be feasible to turn the replaced tablet into a new single-tablet table, so that if you eventually were able to repair it, you could at least get the data back out of it?


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc@4249
PS2, Line 4249:   replaced_tablet->mutable_metadata()->mutable_dirty()->set_state(SysTabletsEntryPB::DELETED,
if we make it DELETED it will also get removed off of the tablet servers. we might want to keep it there for some investigation or attempts to recover the data. Is there a way we can accomplish that?


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@792
PS2, Line 792:     option (kudu.rpc.authz_method) = "AuthorizeService";
we don't want AuthorizeSuperUser here?


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc@517
PS2, Line 517:   LOG(INFO) << "ReplaceTablet: received request to replace tablet " << req->tablet_id();
worth logging the requestor info too



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 23 Feb 2018 23:19:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 2:

(5 comments)

Mostly some nits and high-level points

http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@9
PS2, Line 9: replace_tablet
nit: how about unsafe_replace_tablet, given we're guaranteed "data loss"


http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@13
PS2, Line 13: Before, the table would
            : have to be recreated to recover from a permanently lost tablet
Maybe we should add a check somewhere that we don't actually have a live tablet somewhere. And if we do (or think we do), block the run with a `--force` flag or somesuch.


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@712
PS2, Line 712: there an e
nit: there is an


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2371
PS2, Line 2371:   const int kNumTservers = 3;
              :   const int kNumTablets = 3;
Maybe test with multiple masters too? This would still work, right?


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2392
PS2, Line 2392:   // Replace the tablet. This isn't the use case for the tool, but it should work.
Should it? Maybe we should safeguard against users shooting themselves in the foot.



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 23 Feb 2018 00:25:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9393/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9393/4//COMMIT_MSG@23
PS4, Line 23: Note that, if the
            : tablet recovers quickly enough after being replaced,
> This procedure would amount to scanning the new table and dumping the rows 
Did you mean to comment on the previous sentence? If so, the answer is yup.


http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2405
PS4, Line 2405: replaced_tablet_id
> nit: could we use "old"/"new" instead of "replaced" here? I'm conflicted be
Done


http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2445
PS4, Line 2445: replica
> nit: tablet
Done


http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2445
PS4, Line 2445: plus one new one.
> This is referring to the new tablet count, not the number of replicas, righ
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 13 Mar 2018 06:23:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#4).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet is has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

The original tablet is moved to a new, specially-named table,
named like

replaced_<table name>_<table id>_<tablet id>

so that bad replicas are "quarantined" so they can be investigated,
and if a replica of the original tablet can be recovered, its data
can be scanned and recovered relatively easily. Note that, if the
tablet recovers quickly enough after being replaced, its possible
for clients with cache entries from before the replace to read
the rows from the replaced tablet, but eventually this will stop.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/client/meta_cache.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_tablet.cc
11 files changed, 367 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 18:59:23 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt
File src/kudu/integration-tests/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101
PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest)
> B-but the test always fails and is disabled, because of KUDU-2376. Can I ad
Sorry, I missed that the disabled test was the only test in the file.

Add the TODO near the test itself so it'll be more obvious when it is reenabled.


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376
PS7, Line 4376:   LOG(INFO) << "ReplaceTablet: tablet " << old_tablet->id()
              :             << " deleted and replaced by tablet " << new_tablet->id();
              :   resp->set_replaceme
> Does the following order work?
1. l_table doesn't actually need to be in WRITE mode; READ is sufficient. That should simplify this a bit by eliminating step #4.
2. We should be able to follow what ProcessPendingAssignments does, right? It also replaces tablets. Looks like it first commits the tablets lock, then adds/removes the tablets to the table, then finally updates the tablet map.

In general I'm not sure there's a deadlock free way to do this without exposing a window where GetTableLocations and  GetTabletLocations aren't totally consistent with one another. From the looks of it, ProcessPendingAssignments is vulnerable to the window you're describing.  Maybe it's able to get away with it because the table is expected to be generally unusable during this time (it's just after a CreateTable after all). Though, maybe the same code path is followed during an ALTER TABLE ADD PARTITION too?



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Apr 2018 04:09:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#6).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

Due to KUDU-2376, the new test in replace_tablet-itest fails. Since
this tool is meant to fix broken tables, I think it's still valuable
to add this tool in the meantime.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/master-stress-test.cc
A src/kudu/integration-tests/replace_tablet-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
14 files changed, 502 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/6
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 7:

(21 comments)

Are you comfortable merging this given the risk of KUDU-2376? The same segmentation fault will happen, and is captured by the disabled replace_tablet-itest, just like the test case I posted in the JIRA for alter table.

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc@97
PS7, Line 97: using master::ReplaceTabletRequestPB;
> warning: using decl 'ReplaceTabletRequestPB' is unused [misc-unused-using-d
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc@98
PS7, Line 98: using master::ReplaceTabletResponsePB;
> warning: using decl 'ReplaceTabletResponsePB' is unused [misc-unused-using-
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt
File src/kudu/integration-tests/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101
PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest)
> Could you follow the approach Todd took in commit 1c1d3ba to decide what va
B-but the test always fails and is disabled, because of KUDU-2376. Can I add a TODO here instead?


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc
File src/kudu/integration-tests/master-stress-test.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@297
PS7, Line 297:         // The tablet is gone because it was already replaced or deleted.
> Should we still wait in this case?
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@468
PS7, Line 468:   LOG(INFO) << "Tables created: " << num_tables_created_.Load();
> What's the distribution of operations before and after your change? Curious
The numbers vary a lot, but it's roughly the same before and after. I see (with a debug build) ~70 tables created, ~45 deleted or altered, and about 45 tablets replaced. There's 5 master restarts or so.


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc
File src/kudu/integration-tests/replace_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@63
PS7, Line 63:     CHECK(tablet_id);
> Not needed?
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@76
PS7, Line 76:     CHECK(proxy);
> Not needed?
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4324
PS7, Line 4324:     new_tablet = scoped_refptr<TabletInfo>(new TabletInfo(table, GenerateId()));
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4346
PS7, Line 4346:   // Ensure we abort the mutations if persisting the changes fails.
> Don't the l_table and l_tablets destructors abort automatically?
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4358
PS7, Line 4358:   actions.tablets_to_delete.push_back(old_tablet);
> Hmm, this should be in tablets_to_update, no? We never actually delete entr
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376
PS7, Line 4376:     table->AddRemoveTablets({new_tablet}, {old_tablet});
              :     l_tablets.Commit();
              :     l_table.Commit();
> Are you sure these three should happen with the lock_ held? They acquire lo
Does the following order work?

1. table->AddRemoveTablets
2. l_tablets.Commit()
l(lock_) {
3. Insert to tablet map
}
4. l_table.Commit()

We need tablet + table locks when adding and removing from the table, and we want to commit the tablet state before  publishing the new tablet to the tablet map so uninitialized state won't be visible. However, we need to add the tablet to the tablet map before committing the changes to the table, since if we commit table changes before altering the tablet map there's a window where a GetTableLocations would see the new tablet as part of the table but no info about the new tablet would be available from GetTabletLocations. However, I'm worried about the interval between 2 + 4 where a GetTableLocations would see the old tablet as deleted. I wanted to ensure that GetTableLocations sees either a non-deleted old tablet or a new tablet.


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4385
PS7, Line 4385:     background_tasks_->Wake();
> This is for the newly created tablet, right? Can you update the comment jus
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h@39
PS7, Line 39: class faststring;
> warning: invalid case style for class 'faststring' [readability-identifier-
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h@155
PS7, Line 155:    LeaderMasterProxy() = default;
> Nit: indentation.
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@117
PS7, Line 117: using consensus::ConsensusServiceProxy;
> warning: using decl 'ConsensusServiceProxy' is unused [misc-unused-using-de
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@122
PS7, Line 122: using master::ListMastersRequestPB;
> warning: using decl 'ListMastersRequestPB' is unused [misc-unused-using-dec
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@123
PS7, Line 123: using master::ListMastersResponsePB;
> warning: using decl 'ListMastersResponsePB' is unused [misc-unused-using-de
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@124
PS7, Line 124: using master::ListTabletServersRequestPB;
> warning: using decl 'ListTabletServersRequestPB' is unused [misc-unused-usi
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@125
PS7, Line 125: using master::ListTabletServersResponsePB;
> warning: using decl 'ListTabletServersResponsePB' is unused [misc-unused-us
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@127
PS7, Line 127: using master::ReplaceTabletRequestPB;
> warning: using decl 'ReplaceTabletRequestPB' is unused [misc-unused-using-d
Ack


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@128
PS7, Line 128: using master::ReplaceTabletResponsePB;
> warning: using decl 'ReplaceTabletResponsePB' is unused [misc-unused-using-
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 02 Apr 2018 23:58:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#8).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

Due to KUDU-2376, the new test in replace_tablet-itest fails. Since
this tool is meant to fix broken tables, I think it's still valuable
to add this tool in the meantime.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/master-stress-test.cc
A src/kudu/integration-tests/replace_tablet-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
14 files changed, 490 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/8
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 2:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@9
PS2, Line 9: replace_tablet
> nit: how about unsafe_replace_tablet, given we're guaranteed "data loss"
Done


http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@13
PS2, Line 13: Before, the table would
            : have to be recreated to recover from a permanently lost tablet
> Maybe we should add a check somewhere that we don't actually have a live ta
I looked into this a bit and the master doesn't have that much information about how healthy a tablet is, and isn't the authority on it, especially if the tablet is unavailable but not unrecoverable. So I think having the scary "unsafe" in front of the name suffices here; adding a --force flag would mean in practice we'd always have to specify --force and it wouldn't be meaningful.


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc@97
PS2, Line 97: using master::ReplaceTabletRequestPB;
> warning: using decl 'ReplaceTabletRequestPB' is unused [misc-unused-using-d
Done


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc@98
PS2, Line 98: using master::ReplaceTabletResponsePB;
> warning: using decl 'ReplaceTabletResponsePB' is unused [misc-unused-using-
Done


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h@577
PS2, Line 577: , causing all of its data
             :   // to be permanently lost.
> Would it be feasible to turn the replaced tablet into a new single-tablet t
Done, but it involved making an exception in the MetaCache lookup code because having a tablet switch tables violated some assumptions.


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc@4249
PS2, Line 4249:   replaced_tablet->mutable_metadata()->mutable_dirty()->set_state(SysTabletsEntryPB::DELETED,
> if we make it DELETED it will also get removed off of the tablet servers. w
Done


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@712
PS2, Line 712: there an e
> nit: there is an
Done


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@792
PS2, Line 792:     option (kudu.rpc.authz_method) = "AuthorizeService";
> we don't want AuthorizeSuperUser here?
Done


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc@517
PS2, Line 517:   LOG(INFO) << "ReplaceTablet: received request to replace tablet " << req->tablet_id();
> worth logging the requestor info too
Done


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2371
PS2, Line 2371:   const int kNumTservers = 3;
              :   const int kNumTablets = 3;
> Maybe test with multiple masters too? This would still work, right?
It's a bit of work to change ToolTest to support that and there's nothing really different about multimaster, so I hope you don't mind if I pass on it. (Also my manual testing was with multimaster).


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2392
PS2, Line 2392:   // Replace the tablet. This isn't the use case for the tool, but it should work.
> Should it? Maybe we should safeguard against users shooting themselves in t
The master can think a tablet is healthy when it's busted, as we've seen, and it's not the authority on the health of a tablet, so I think we ought to trust the users to know what they are doing or be scared of the "unsafe" at the beginning of the action name.


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2490
PS2, Line 2490: workload.StopAndJoin();
> Another thought: maybe use the cluster verifier after this to make sure thi
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sun, 11 Mar 2018 19:26:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt
File src/kudu/integration-tests/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101
PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest)
> Sorry, I missed that the disabled test was the only test in the file.
Done


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376
PS7, Line 4376:   LOG(INFO) << "ReplaceTablet: tablet " << old_tablet->id()
              :             << " deleted and replaced by tablet " << new_tablet->id();
              :   resp->set_replaceme
> 1. l_table doesn't actually need to be in WRITE mode; READ is sufficient. T
I took a look at alter table and mirrored that, since it seemed right to me and it handles a similar situation where a range partition is dropped and added in one alter.



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Apr 2018 06:44:57 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 1:

I think tidy bot is wrong about the using decls.


-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 22 Feb 2018 19:32:31 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 4:

> Build Failed
 > 
 > http://jenkins.kudu.apache.org/job/kudu-gerrit/12378/ : FAILURE

Another instance of https://issues.apache.org/jira/browse/KUDU-1736


-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 12 Mar 2018 19:05:05 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2490
PS2, Line 2490: workload.StopAndJoin();
Another thought: maybe use the cluster verifier after this to make sure things look spick and span. I'd imagine ksck after this point would be immediately healthy, right?



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 23 Feb 2018 17:43:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 5:

(3 comments)

I have reservations about how this patch introduces a new 'quarantined' table.  In no other circumstances do we pollute the table namespace with auto-created names, and I don't think this is a good reason to start. This will become very painful down the line when begin to integrate with thirdparty metadata systems like Hive and Sentry which have no such concept as a quarantined table.

I think it would be better to simply swap in the new tablet to the existing table, and provide the option to have the master not delete the tablet, if the data is still relevant.  The data could then still be accessed and operated on via the CLI tools which take the tablet ID directly.  Even then there are still authorization issues that will need to be figured out eventually.

http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@26
PS5, Line 26: eventually this will stop
> Maybe be more specific; this behavior ends when the cached entries' TTL exp
There may be some interesting fallout from this. In particular, I believe this is the first time in which the meta-cache TTL is significant for _positive_ cache locations .  Right now I think the TTL only has observable effects for non-covered ranges.

There's nothing actionable, but it's maybe something to think about and document.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@912
PS5, Line 912:         // Partition should not have changed.
> Could you clarify this a bit: the partition's _boundaries_ should have not 
What is a partition except a pair of bounds, in other words, what does your suggestion clarify?


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@715
PS5, Line 715: 
> I consider myself to be a protobuf luddite, but I don't understand why 'opt
Keep it optional, please.  We should never introduce new required fields.  This is a great example, since you could later go back and add a variant of tablet replacement which just deletes the old version, and doesn't move it to a quarantined state.  If we lived in a perfect world with perfect foresight we could use required, but we don't and we shouldn't.



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 13 Mar 2018 23:45:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#2).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet replace_tablet` that can be used
to replace a tablet with another having the same partition
information. Any data in the replaced tablet is lost. This tool
is useful when a tablet is has permanently lost all replicas
because it can repair the tablet's table. Before, the table would
have to be recreated to recover from a permanently lost tablet.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_tablet.cc
10 files changed, 318 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 10:

(1 comment)

LGTM except a small nit!

http://gerrit.cloudera.org:8080/#/c/9393/10/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

http://gerrit.cloudera.org:8080/#/c/9393/10/src/kudu/tools/tool_action_tablet.cc@549
PS10, Line 549:   cout << "Replaced tablet " << tablet_id
If it's not too fiddly, it might be nice to print the 'Replaced tablet ${tablet_id} with tablet " portion to stderr and the new tablet ID to stdout.  This will make it much easier to use this in a script in the future where the new tablet ID is needed.

    cerr << "Replaced tablet " << tablet_id << " with tablet ";
    cout << resp.replacement_tablet_id() << endl;



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 17:18:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 11: Verified+1

IWYU failures are bogus (they break the build if they are applied).


-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 21:01:29 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#9).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

Due to KUDU-2376, the new test in replace_tablet-itest fails. Since
this tool is meant to fix broken tables, I think it's still valuable
to add this tool in the meantime.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/master-stress-test.cc
A src/kudu/integration-tests/replace_tablet-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
14 files changed, 496 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/9
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 5:

(13 comments)

After discussion, I decided to take out the "quarantine" part of this patch:
1. It's a significant question how it would work with authz.
2. The quarantine table is by its nature not writable, and it's a question of if it should be.
3. There's overall a lot of things to consider about how such a setup would work.

So now this patch will just seek to add a way to replace a tablet with a new one, deleting the data from the old one.

http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@11
PS5, Line 11: is
> Nit: omit this
Done


http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@19
PS5, Line 19: <table id>
> Given that tablet_id is globally unique and table_name already identifies t
n/a since the "quarantine" part is being removed for now.


http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@24
PS5, Line 24: its
> Nit: it's
n/a


http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@26
PS5, Line 26: eventually this will stop
> There may be some interesting fallout from this. In particular, I believe t
n/a because the "quarantine" piece is being removed.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@912
PS5, Line 912:         // Partition should not have changed.
> I've developed a bad habit of using "partition" and "tablet" interchangeabl
n/a since the "swapping" of the old tablet to a quarantine table is being removed.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@919
PS5, Line 919:         // In this case, 'tablets_by_key' will be empty but a RemoteTablet
             :         // will exist in the cache.
> I don't understand this case. AFAICT tablet replacement is "atomic", which 
n/a since the "swapping" of the old tablet to a quarantine table is being removed.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.h@581
PS5, Line 581:   Status ReplaceTablet(const std::string& tablet_id, master::ReplaceTabletResponsePB* resp);
> It'd be nice to have testing of this new method outside of the CLI tests. B
I added coverage in master-stress-test. I also added an integration test to test writing to a table as once of its tablets is replaced. This fails due to an existing bug in the client.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc@4302
PS5, Line 4302:   // To be safe, we'll take the global catalog manager lock for the rest of this method.
> While I appreciate the motivation to block out everything else, I don't thi
Done


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc@4303
PS5, Line 4303: lock
> Style nit: Use l (or l_foo) for lock guards. It especially helps in this ca
Done


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@707
PS5, Line 707: recovered
> Nit: "manually recovered", to make it clear that any actual recovery is out
n/a since the "swapping" of the old tablet to a quarantine table is being removed.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@715
PS5, Line 715: 
> Just to close this out, Dan (and Todd) reminded me that for scalars, option
Thanks Adar.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_common.cc@593
PS5, Line 593: 
> Nit: these were all grouped together, so either omit this empty line, or ad
Done


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_tablet.cc@537
PS5, Line 537:   proxy.SyncRpc<ReplaceTabletRequestPB, ReplaceTabletResponsePB>(
> This returns a Status that should be checked.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 29 Mar 2018 08:35:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@912
PS5, Line 912:         // Partition should not have changed.
> What is a partition except a pair of bounds, in other words, what does your
I've developed a bad habit of using "partition" and "tablet" interchangeably. When a tablet is replaced, its partition (or partition boundaries) stay the same, but the tablet ID changes. That's what I'm getting at here; that the ID has changed.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@715
PS5, Line 715: 
> Keep it optional, please.  We should never introduce new required fields.  
Just to close this out, Dan (and Todd) reminded me that for scalars, optional provides a reasonable default value, one that would have been a valid value to begin with. For example, the default for "bytes" in C++ is the empty string, which would have been valid to send had the record been required too. Meaning, if the empty string is invalid input, it would need to be sanitized regardless of whether the field was optional or required.

The default value for embedded messages is null and thus is an exception to this rule (i.e. you can't dereference 'foo' without first validating it via 'has_foo()'), but those are less prevalent than scalars, so the burden isn't as onerous.



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 14 Mar 2018 00:06:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 5:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@11
PS5, Line 11: is
Nit: omit this


http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@19
PS5, Line 19: <table id>
Given that tablet_id is globally unique and table_name already identifies the table for humans, what value does the inclusion of table_id add?


http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@24
PS5, Line 24: its
Nit: it's


http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@26
PS5, Line 26: eventually this will stop
Maybe be more specific; this behavior ends when the cached entries' TTL expires?


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@912
PS5, Line 912:         // Partition should not have changed.
Could you clarify this a bit: the partition's _boundaries_ should have not changed,


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@919
PS5, Line 919:         // In this case, 'tablets_by_key' will be empty but a RemoteTablet
             :         // will exist in the cache.
I don't understand this case. AFAICT tablet replacement is "atomic", which means the client sees one of two states for a replaced tablet:
1. Tablet with ID foo from key x to y.
2. Tablet with ID bar from key x to y.

Since the boundaries of the tablet haven't changed, doesn't this mean tablet_lower_bound is a valid way to look up the cached entry regardless of which state the client is exposed to?

Looking at the code some more, it seems that following tablet replacement, 'remote' is going to be null so we'll skip all of this, erase the tablet's partition from tablets_by_key (L937), set up a new RemoteTablet for the replacement, and insert the tablet's partition again (L950). At no point is the new tablet exposed in tablets_by_id_ without any corresponding partitions in tablets_by_key. I must be missing something here...


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.h@581
PS5, Line 581:   Status ReplaceTablet(const std::string& tablet_id, master::ReplaceTabletResponsePB* resp);
It'd be nice to have testing of this new method outside of the CLI tests. Both in isolation as well as in a "stress" environment (i.e. ongoing concurrent data operations to the table and/or replaced tablet). It'd also be nice to test concurrent metadata operations: what happens if I delete a table and replace one of its tablets at the same time? What happens if I alter it concurrently? Etc.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc@4302
PS5, Line 4302:   // To be safe, we'll take the global catalog manager lock for the rest of this method.
While I appreciate the motivation to block out everything else, I don't think this is a good idea. First, the global catalog manager lock is a spinlock, which means anyone waiting on this lock will be spinning, and they'll be waiting for a long time because in ReplaceTablet the lock is held during the catalog write, which involves network and disk IO. Second, the catalog manager lock is never held while acquiring other catalog manager locks; breaking that invariant raises the possibility of deadlocks elsewhere. The only exceptions I'm aware of to this rule are during catalog manager initialization, at which point RPCs are largely rejected anyway.

Can we treat this like any other DDL method and acquire the right locks at the right time? The background task does tablet replacement in CatalogManager::ProcessPendingAssignments; can that be used as a template for how to do it safely here?


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc@4303
PS5, Line 4303: lock
Style nit: Use l (or l_foo) for lock guards. It especially helps in this case, since lock and lock_ are so similar.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@707
PS5, Line 707: recovered
Nit: "manually recovered", to make it clear that any actual recovery is out of scope of Kudu's normal operations?


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@715
PS5, Line 715: 
I consider myself to be a protobuf luddite, but I don't understand why 'optional' is appropriate for the below fields. There's the philosophical argument that all fields should be optional and 'required' was a mistake, but AFAICT this just pushes the burden of message verification to the receiving end. In this particular case, doesn't it mean that the client receiving the response has to verify that these optional fields exist in the response, and surface an error if they don't? That just seems like unnecessary (and untestable) boilerplate code.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_common.cc@593
PS5, Line 593: 
Nit: these were all grouped together, so either omit this empty line, or add an empty line just past L584.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_tablet.cc@537
PS5, Line 537:   proxy.SyncRpc<ReplaceTabletRequestPB, ReplaceTabletResponsePB>(
This returns a Status that should be checked.


http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_tablet.cc@631
PS5, Line 631: . 
Maybe add a \n here? See what it looks like on the command line first though.



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 13 Mar 2018 18:42:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9393/10/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

http://gerrit.cloudera.org:8080/#/c/9393/10/src/kudu/tools/tool_action_tablet.cc@549
PS10, Line 549:   cout << "Replaced tablet " << tablet_id
> If it's not too fiddly, it might be nice to print the 'Replaced tablet ${ta
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 18:59:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 7:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt
File src/kudu/integration-tests/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101
PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest)
Could you follow the approach Todd took in commit 1c1d3ba to decide what value of PROCESSORS to assign to this new itest?


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc
File src/kudu/integration-tests/master-stress-test.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@297
PS7, Line 297:         // The tablet is gone because it was already replaced or deleted.
Should we still wait in this case?


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@468
PS7, Line 468:   LOG(INFO) << "Tables created: " << num_tables_created_.Load();
What's the distribution of operations before and after your change? Curious how tablet replacement has affected things.


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc
File src/kudu/integration-tests/replace_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@63
PS7, Line 63:     CHECK(tablet_id);
Not needed?


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@76
PS7, Line 76:     CHECK(proxy);
Not needed?


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4324
PS7, Line 4324:     new_tablet = scoped_refptr<TabletInfo>(new TabletInfo(table, GenerateId()));
How about:

  new_tablet.reset(new TabletInfo(...));


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4346
PS7, Line 4346:   // Ensure we abort the mutations if persisting the changes fails.
Don't the l_table and l_tablets destructors abort automatically?


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4358
PS7, Line 4358:   actions.tablets_to_delete.push_back(old_tablet);
Hmm, this should be in tablets_to_update, no? We never actually delete entries from the catalog table.


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376
PS7, Line 4376:     table->AddRemoveTablets({new_tablet}, {old_tablet});
              :     l_tablets.Commit();
              :     l_table.Commit();
Are you sure these three should happen with the lock_ held? They acquire locks of their own; Commit() may also sleep waiting for locks, which we shouldn't do with a spinlock held.

IIRC other CatalogManager operations only hold lock_ to update global maps, then do the other steps without it.


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4385
PS7, Line 4385:     background_tasks_->Wake();
This is for the newly created tablet, right? Can you update the comment just above?


http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h@155
PS7, Line 155:    LeaderMasterProxy() = default;
Nit: indentation.



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 30 Mar 2018 04:37:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#5).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet is has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

The original tablet is moved to a new, specially-named table,
named like

replaced_<table name>_<table id>_<tablet id>

so that bad replicas are "quarantined" so they can be investigated,
and if a replica of the original tablet can be recovered, its data
can be scanned and recovered relatively easily. Note that, if the
tablet recovers quickly enough after being replaced, its possible
for clients with cache entries from before the replace to read
the rows from the replaced tablet, but eventually this will stop.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/client/meta_cache.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_tablet.cc
11 files changed, 366 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/5
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/integration-tests/replace_tablet-itest.cc
File src/kudu/integration-tests/replace_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/integration-tests/replace_tablet-itest.cc@90
PS9, Line 90: PROCESSOR
Nit: PROCESSORS


http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc@4364
PS9, Line 4364:     new_tablet->metadata().ReadLock();
Nit: use a new TabletMetadataLock for this.


http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc@4371
PS9, Line 4371:   l_table.Commit();
Actually, you don't need a TableMetadataLock at all; I misunderstood the comment at the top of TableInfo::AddRemoveTablets: you need READ-level locks (or stronger) on the _tablets_, not the table. And unlike AlterTable, we're not making any metadata changes to the TableInfo (e.g. to its schema), so you can forgo its lock altogether.



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 03 Apr 2018 17:40:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#3).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet is has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

The original tablet is moved to a new, specially-named table,
named like

replaced_<table name>_<table id>_<tablet id>

so that bad replicas are "quarantined" so they can be investigated,
and if a replica of the original tablet can be recovered, its data
can be scanned and recovered relatively easily. Note that, if the
tablet recovers quickly enough after being replaced, its possible
for clients with cache entries from before the replace to read
the rows from the replaced tablet, but eventually this will stop.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/client/meta_cache.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_tablet.cc
11 files changed, 367 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

Due to KUDU-2376, the new test in replace_tablet-itest fails. Since
this tool is meant to fix broken tables, I think it's still valuable
to add this tool in the meantime.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Reviewed-on: http://gerrit.cloudera.org:8080/9393
Reviewed-by: Dan Burkert <da...@apache.org>
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Will Berkeley <wd...@gmail.com>
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/master-stress-test.cc
A src/kudu/integration-tests/replace_tablet-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
14 files changed, 494 insertions(+), 9 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Adar Dembo: Looks good to me, approved
  Will Berkeley: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 12
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#11).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

Due to KUDU-2376, the new test in replace_tablet-itest fails. Since
this tool is meant to fix broken tables, I think it's still valuable
to add this tool in the meantime.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/master-stress-test.cc
A src/kudu/integration-tests/replace_tablet-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
14 files changed, 494 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/11
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9393/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9393/4//COMMIT_MSG@23
PS4, Line 23: Note that, if the
            : tablet recovers quickly enough after being replaced,
This procedure would amount to scanning the new table and dumping the rows into the old one, right?


http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2405
PS4, Line 2405: replaced_tablet_id
nit: could we use "old"/"new" instead of "replaced" here? I'm conflicted because I think it's a clearer than "replaced"/"replacement"/"to-replace", even if it's not as accurate.


http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2445
PS4, Line 2445: replica
nit: tablet


http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2445
PS4, Line 2445: plus one new one.
This is referring to the new tablet count, not the number of replicas, right?



-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 12 Mar 2018 19:11:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#7).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

Due to KUDU-2376, the new test in replace_tablet-itest fails. Since
this tool is meant to fix broken tables, I think it's still valuable
to add this tool in the meantime.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/master-stress-test.cc
A src/kudu/integration-tests/replace_tablet-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
14 files changed, 506 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/7
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed a vote on this change.

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2290: Tool to re-create a tablet

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9393

to look at the new patch set (#10).

Change subject: KUDU-2290: Tool to re-create a tablet
......................................................................

KUDU-2290: Tool to re-create a tablet

This adds a tool `kudu tablet unsafe_replace_tablet` that can be
used to replace a tablet with another having the same partition
information. This tool is useful when a tablet has permanently
lost all replicas because it can repair the tablet's table.
Before, the table would have to be recreated to recover from a
permanently lost tablet.

Due to KUDU-2376, the new test in replace_tablet-itest fails. Since
this tool is meant to fix broken tables, I think it's still valuable
to add this tool in the meantime.

Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/master-stress-test.cc
A src/kudu/integration-tests/replace_tablet-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_tablet.cc
14 files changed, 493 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/10
-- 
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>