You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org> on 2017/04/06 21:54:01 UTC

[kudu-CR](branch-1.2.x) KUDU-1607. Unpin tablet flush after failed bootstrap

Jean-Daniel Cryans has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6572

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................

KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it.
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Add a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based itests.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Reviewed-on: http://gerrit.cloudera.org:8080/6422
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
(cherry picked from commit 0450f77f69c74cc6dec08ad36bb89ac12c17608f)
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 224 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/6572/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6572
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR](branch-1.2.x) KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6572

to look at the new patch set (#3).

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................

KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it.
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Add a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based itests.

Note: This backport has been modified not to rely on KUDU-1741.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Reviewed-on: http://gerrit.cloudera.org:8080/6422
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
(cherry picked from commit 0450f77f69c74cc6dec08ad36bb89ac12c17608f)
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 224 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/6572/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6572
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR](branch-1.2.x) KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 1:

Argh I think this is dependent on https://github.com/apache/kudu/commit/4114ba9bf1cdfbd03eef2a13d6e51b6094527997 so I'll let Mike be the judge of if this is good or not.

-- 
To view, visit http://gerrit.cloudera.org:8080/6572
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No

[kudu-CR](branch-1.2.x) KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6572/3//COMMIT_MSG
Commit Message:

Line 35: Note: This backport has been modified not to rely on KUDU-1741.
Cool, thanks!


-- 
To view, visit http://gerrit.cloudera.org:8080/6572
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR](branch-1.2.x) KUDU-1607. Unpin tablet flush after failed bootstrap

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has submitted this change and it was merged.

Change subject: KUDU-1607. Unpin tablet flush after failed bootstrap
......................................................................


KUDU-1607. Unpin tablet flush after failed bootstrap

We have heard reports that, in certain scenarios, a failed tablet is
unable to be deleted. After some investigation, I determined that this
is because we neglect to unpin the Tablet when tablet bootstrap fails.

When a table is being deleted, we delete each tablet superblock by
calling TabletMetadata::DeleteSuperBlock(). This method double-checks
that no orphaned blocks remain referenced to ensure we don't leak disk
space. That requires TabletMetadata::DeleteTabletData() to be called
first, which invokes Flush() twice to ensure that no orphaned blocks are
referenced on disk upon returning. However, if the tablet is pinned,
Flush() silently becomes a no-op (except for a log message that is
printed) and so DeleteSuperBlock() also fails, resulting in the tablet
not being fully deleted and the following warning message being written
to the log file:

W0317 13:05:19.324373 13242 ts_tablet_manager.cc:634] Invalid argument: Unable to delete on-disk data from tablet d1b857ddaa0743c79c9bcbd9b2fda174: The metadata for tablet d1b857ddaa0743c79c9bcbd9b2fda174 still references orphaned blocks. Call DeleteTabletData() first

This patch makes the following changes:

1. Always unpin the tablet after pinning it.
2. Add a new itest that fails without that change, reproducing the
   above warning message.
3. Add a little more test infra to MiniCluster (and the MiniClusterBase
   interface) to make it easier to use the helper functions in
   cluster_itest_util.h in MiniCluster-based itests.

Note: This backport has been modified not to rely on KUDU-1741.

Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Reviewed-on: http://gerrit.cloudera.org:8080/6422
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
(cherry picked from commit 0450f77f69c74cc6dec08ad36bb89ac12c17608f)
Reviewed-on: http://gerrit.cloudera.org:8080/6572
Reviewed-by: Jean-Daniel Cryans <jd...@apache.org>
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/integration-tests/mini_cluster.h
M src/kudu/integration-tests/mini_cluster_base.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/mini_tablet_server.cc
M src/kudu/tserver/mini_tablet_server.h
10 files changed, 224 insertions(+), 48 deletions(-)

Approvals:
  Jean-Daniel Cryans: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6572
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id274c6ee1da75bc6f92ab91c0a01edaa009b8962
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>