You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2017/03/18 01:25:43 UTC

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Hello David Ribeiro Alves,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/6422

to review the following change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................

KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it (using RAII).
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Adds a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based tests.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
A src/kudu/integration-tests/tablet_delete-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 189 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/6422/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 4:

(6 comments)

sorry had some comments on an old rev. I checked and most still seem relevant though

http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/integration-tests/external_mini_cluster.h
File src/kudu/integration-tests/external_mini_cluster.h:

PS3, Line 230: std::shared_ptr<rpc::Messenger> messenger() const override;
             :   std::shared_ptr<master::MasterServiceProxy> master_proxy() const override;
             :   std::shared_ptr<master::MasterServiceProxy> master_proxy(int idx) const override;
does the superclass have the docs?


http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/integration-tests/mini_cluster_base.h
File src/kudu/integration-tests/mini_cluster_base.h:

PS3, Line 77: // Return a messenger for use by clients.
It does :), great


http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/integration-tests/tablet_delete-itest.cc
File src/kudu/integration-tests/tablet_delete-itest.cc:

PS3, Line 38: class TabletDeleteITest : public MiniClusterITestBase
nit: it's weird that we have a delete_table-test (DeleteTableTest) and a tablet_delete-itest (TabletDeleteITest). Could you make the naming consistent?


PS3, Line 43:   NO_FATALS(StartCluster(1)); // 1 TS.
            :   TestWorkload workload(cluster_.get());
            :   workload.set_num_replicas(1);
            :   workload.Setup();
            : 
            :   std::unordered_map<std::string, itest::TServerDetails*> ts_map;
            :   ValueDeleter del(&ts_map);
            :   ASSERT_OK(itest::CreateTabletServerMap(cluster_->master_proxy().get(),
            :                                          cluster_->messenger(),
            :                                          &ts_map));
            :   auto* mts = cluster_->mini_tablet_server(0);
            :   auto* ts = ts_map[mts->uuid()];
            : 
            :   scoped_refptr<TabletPeer> tablet_peer;
            :   AssertEventually([&] {
            :     vector<scoped_refptr<TabletPeer>> tablet_peers;
            :     mts->server()->tablet_manager()->GetTabletPeers(&tablet_peers);
            :     ASSERT_EQ(1, tablet_peers.size());
            :     tablet_peer = tablet_peers[0];
            :   });
            :   NO_FATALS();
            : 
            :   workload.Start();
            :   while (workload.rows_inserted() < 100) {
            :     SleepFor(MonoDelta::FromMilliseconds(10));
            :   }
            :   workload.StopAndJoin();
This code seems eerly familiar from a previous patch of yours. Could you extract a helper? I mean to start an n node cluster, make sure it has n tablet peers. Maybe even add a bool to write something for a while, since this is pretty common setup.


http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

Line 472:   Status unpin_status = meta_->UnPinFlush();
> Not if we don't flush the superblock.
Unpinning, even for a failed bootstrap should be ok. If it failed it's not like we're using a part of the data or trying to fix it so from a correctness perspective it should be ok to unpin.


http://gerrit.cloudera.org:8080/#/c/6422/4/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

PS4, Line 465: tablet_ && log_
could we just have bootstrap_status.ok()


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

Line 472:   Status unpin_status = meta_->UnPinFlush();
> Won't any orphaned blocks be cleaned up when the operator, after realizing 
Not if we don't flush the superblock.


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

Line 471:   // This will cause any pending TabletMetadata flush to be executed, if any.
> Nit: redundant "any" between "any pending" and "if any".
Done


Line 472:   Status unpin_status = meta_->UnPinFlush();
> Is it possible for this to flush even if !bootstrap_status.ok()? That seems
Yes, it will always flush is there were tablet updates. This is an interesting question and I wonder what David thinks about this.

Because we always roll forward on tablet bootstrap, I don't think it's necessarily bad to do that. In fact, it's probably good because it avoid orphaning blocks?


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

Line 472:   Status unpin_status = meta_->UnPinFlush();
> Yes, it will always flush is there were tablet updates. This is an interest
Won't any orphaned blocks be cleaned up when the operator, after realizing that this replica can't bootstrap, deletes it?


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6422

to look at the new patch set (#5).

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................

KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it.
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Adds a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based tests.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
A src/kudu/integration-tests/tablet_delete-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 224 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/6422/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6422

to look at the new patch set (#6).

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................

KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it.
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Add a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based itests.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 224 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/6422/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

Line 471:   // This will cause any pending TabletMetadata flush to be executed, if any.
Nit: redundant "any" between "any pending" and "if any".


Line 472:   Status unpin_status = meta_->UnPinFlush();
Is it possible for this to flush even if !bootstrap_status.ok()? That seems bad; is there some other mechanism in place to prevent that from happening?


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6422

to look at the new patch set (#4).

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................

KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it.
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Adds a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based tests.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
A src/kudu/integration-tests/tablet_delete-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 222 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/6422/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/integration-tests/tablet_delete-itest.cc
File src/kudu/integration-tests/tablet_delete-itest.cc:

PS3, Line 38: class TabletDeleteITest : public MiniClusterITestBase
> Think we should rename this one, but not against adding an "i" to the other
k


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/tablet_delete-itest.cc
File src/kudu/integration-tests/tablet_delete-itest.cc:

Line 60:     ASSERT_EQ(1, tablet_peers.size());
Hmm, this isn't guaranteed by the time StartCluster() returned? I thought the default behavior for those test methods was to wait until the master has discovered all of the tservers.


Line 88:   AssertEventually([&] {
Why must this be wrapped in AssertEventually? Doesn't WaitForAllBootstrapsToFinish() (or even Restart()) guarantee that the sole peer exists when they're done?


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

Line 513:   meta_->PinFlush();
Nit: add a comment explaining why we pin here (and not, say, earlier).


Line 515:     WARN_NOT_OK(meta_->UnPinFlush(), Substitute("$0Failed to flush after unpinning",
Warning makes sense if some other function below fails, but suppose there's no failure there but there is a failure here; wouldn't we want to return that failure up the call stack?


Line 530: Status TabletBootstrap::FinishBootstrap(const string& message,
Could make this return void now.


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/integration-tests/tablet_delete-itest.cc
File src/kudu/integration-tests/tablet_delete-itest.cc:

PS3, Line 38: class TabletDeleteITest : public MiniClusterITestBase
> delete_table-test is unique because it's not named -itest, while all the ot
Think we should rename this one, but not against adding an "i" to the other in a follow up.


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it.
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Add a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based itests.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Reviewed-on: http://gerrit.cloudera.org:8080/6422
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 224 insertions(+), 48 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 6:

Thanks!

-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6422/2//COMMIT_MSG
Commit Message:

PS2, Line 7: .
> nit: consider dropping the period
Hmm. Several people on the team have been writing commit messages on Kudu this way for years, including me. We borrowed this practice from Hadoop commit messages (the JIRA number and trailing period is actually enforced there). I am not sure why I should change now; I think it's really readable. But I'm happy to discuss normalizing commit messages on the dev@ list if you'd like to start that discussion. I'm +1 for consistency across all the commits to the project.


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/CMakeLists.txt
File src/kudu/integration-tests/CMakeLists.txt:

PS2, Line 87: ADD_KUDU_TEST(tablet_delete-itest)
> nit: consider keeping the sorted order; it seems this one should come after
Done


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/mini_cluster.cc
File src/kudu/integration-tests/mini_cluster.cc:

PS2, Line 344: CHECK_LT(idx, mini_masters_.size());
> nit: this seems redundant because mini_master(int idx) already has correspo
Done


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/mini_cluster_base.h
File src/kudu/integration-tests/mini_cluster_base.h:

PS2, Line 31: nanmespace
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/tablet_delete-itest.cc
File src/kudu/integration-tests/tablet_delete-itest.cc:

Line 60:     ASSERT_EQ(1, tablet_peers.size());
> Hmm, this isn't guaranteed by the time StartCluster() returned? I thought t
A TabletPeer wraps a Tablet in a single tserver and isn't related to # tservers


Line 88:   AssertEventually([&] {
> Why must this be wrapped in AssertEventually? Doesn't WaitForAllBootstrapsT
Restart() no, but you are right that WaitForAllBootstrapsToFinish() is enough and I shouldn't have to AssertEventually() here.


PS2, Line 103: ASSERT_EQ(0, tablet_peers.size());
> nit: would ASSERT_TRUE(tablet_peers.empty()) be more idiomatic?
Maybe, but if it failed then it wouldn't tell me how many peers it found. So I prefer it this way.


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

Line 513:   meta_->PinFlush();
> Nit: add a comment explaining why we pin here (and not, say, earlier).
I just moved this to be the first thing we do during bootstrap.


Line 515:     WARN_NOT_OK(meta_->UnPinFlush(), Substitute("$0Failed to flush after unpinning",
> Warning makes sense if some other function below fails, but suppose there's
That's true. It's now a bit uglier, but I fixed this.


Line 530: Status TabletBootstrap::FinishBootstrap(const string& message,
> Could make this return void now.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has uploaded a new patch set (#2).

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................

KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it (using RAII).
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Adds a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based tests.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
A src/kudu/integration-tests/tablet_delete-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 188 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/6422/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6422

to look at the new patch set (#3).

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................

KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it.
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Adds a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based tests.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
A src/kudu/integration-tests/tablet_delete-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 221 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/6422/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 2:

(5 comments)

A few nits -- will take a closer/thoughful look later this weekend.

http://gerrit.cloudera.org:8080/#/c/6422/2//COMMIT_MSG
Commit Message:

PS2, Line 7: .
nit: consider dropping the period


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/CMakeLists.txt
File src/kudu/integration-tests/CMakeLists.txt:

PS2, Line 87: ADD_KUDU_TEST(tablet_delete-itest)
nit: consider keeping the sorted order; it seems this one should come after tablet_copy_client_session-itest but before tablet_history_gc-itest


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/mini_cluster.cc
File src/kudu/integration-tests/mini_cluster.cc:

PS2, Line 344: CHECK_LT(idx, mini_masters_.size());
nit: this seems redundant because mini_master(int idx) already has corresponding CHECK()


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/mini_cluster_base.h
File src/kudu/integration-tests/mini_cluster_base.h:

PS2, Line 31: nanmespace
nit: typo


http://gerrit.cloudera.org:8080/#/c/6422/2/src/kudu/integration-tests/tablet_delete-itest.cc
File src/kudu/integration-tests/tablet_delete-itest.cc:

PS2, Line 103: ASSERT_EQ(0, tablet_peers.size());
nit: would ASSERT_TRUE(tablet_peers.empty()) be more idiomatic?


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6422/3/src/kudu/integration-tests/tablet_delete-itest.cc
File src/kudu/integration-tests/tablet_delete-itest.cc:

PS3, Line 38: class TabletDeleteITest : public MiniClusterITestBase
> nit: it's weird that we have a delete_table-test (DeleteTableTest) and a ta
delete_table-test is unique because it's not named -itest, while all the other integration tests are. How about I rename that test to table_delete-itest in a follow up commit? Or you think I should name this delete_tablet-itest?


PS3, Line 43:   NO_FATALS(StartCluster(1)); // 1 TS.
            :   TestWorkload workload(cluster_.get());
            :   workload.set_num_replicas(1);
            :   workload.Setup();
            : 
            :   std::unordered_map<std::string, itest::TServerDetails*> ts_map;
            :   ValueDeleter del(&ts_map);
            :   ASSERT_OK(itest::CreateTabletServerMap(cluster_->master_proxy().get(),
            :                                          cluster_->messenger(),
            :                                          &ts_map));
            :   auto* mts = cluster_->mini_tablet_server(0);
            :   auto* ts = ts_map[mts->uuid()];
            : 
            :   scoped_refptr<TabletPeer> tablet_peer;
            :   AssertEventually([&] {
            :     vector<scoped_refptr<TabletPeer>> tablet_peers;
            :     mts->server()->tablet_manager()->GetTabletPeers(&tablet_peers);
            :     ASSERT_EQ(1, tablet_peers.size());
            :     tablet_peer = tablet_peers[0];
            :   });
            :   NO_FATALS();
            : 
            :   workload.Start();
            :   while (workload.rows_inserted() < 100) {
            :     SleepFor(MonoDelta::FromMilliseconds(10));
            :   }
            :   workload.StopAndJoin();
> This code seems eerly familiar from a previous patch of yours. Could you ex
This is similar to some ExternalMiniCluster tests, but it shouldn't be too familiar for MiniCluster tests because I just added, in this patch, the capability to easily call the cluster_itest_util.h methods from MiniCluster tests. TBH I don't think we're quite at the point where we can extract a lot of useful stuff yet although it's been on my mind and instead of doing a lot of up-front yak-shaving I have been slowly working around the edges like in this patch.

I think in this case it's too early to say this code fragment is very reusable so I'd prefer work on higher-priority things in the short term and punt on this for now.

In the medium term I would like to find a way to either pull some functionality into the minicluster base class or a helper that hooks into APIs exposed by the base class to create the tablet server map implicitly. I'm not sure I agree with the workload thing you mentioned, since it's nice to have tests that only do what they need to do.


http://gerrit.cloudera.org:8080/#/c/6422/4/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

PS4, Line 465: tablet_ && log_
> could we just have bootstrap_status.ok()
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6422
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes