You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2017/08/31 02:41:16 UTC

[kudu-CR](branch-1.5.x) KUDU-2118. Fully shut down TabletReplica on delete

Hello Dan Burkert, Adar Dembo, Kudu Jenkins,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7911

to review the following change.

Change subject: KUDU-2118. Fully shut down TabletReplica on delete
......................................................................

KUDU-2118. Fully shut down TabletReplica on delete

After the various lifecycle changes implemented in
5bca7d8ba185d62952fb3e3163cbe88d20453da0 we were left with a case where
we fail to fully shut down TABLET_DATA_DELETED replicas before
dereferencing them. Because LeaderElection via ElectionDecisionCallback
will hold a reference to RaftConsensus while a vote is taking place, the
leader election callback running on the reactor thread may destroy the
RaftConsensus object. If that occurs without RaftConsensus already being
in a fully-shutdown state, the RaftConsensus destructor runs Shutdown()
before destoying the object, which takes a lock and triggers an
assertion via AssertWaitAllowed(). This crashes the server.

This patch fixes the issue by calling Shutdown() on TABLET_DATA_DELETED
replicas before dereferencing them in TSTabletManager.

A concurrency test is included that spawns threads to start elections
while the tablets are being deleted. This test pretty reliably fails
without the included fix, and is solidly stable with the fix.

Change-Id: I4a0ea46f220a10d4de680fc73d45d14426e0395e
Reviewed-on: http://gerrit.cloudera.org:8080/7883
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/tserver/ts_tablet_manager.cc
2 files changed, 59 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/7911/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4a0ea46f220a10d4de680fc73d45d14426e0395e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR](branch-1.5.x) KUDU-2118. Fully shut down TabletReplica on delete

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged.

Change subject: KUDU-2118. Fully shut down TabletReplica on delete
......................................................................


KUDU-2118. Fully shut down TabletReplica on delete

After the various lifecycle changes implemented in
5bca7d8ba185d62952fb3e3163cbe88d20453da0 we were left with a case where
we fail to fully shut down TABLET_DATA_DELETED replicas before
dereferencing them. Because LeaderElection via ElectionDecisionCallback
will hold a reference to RaftConsensus while a vote is taking place, the
leader election callback running on the reactor thread may destroy the
RaftConsensus object. If that occurs without RaftConsensus already being
in a fully-shutdown state, the RaftConsensus destructor runs Shutdown()
before destoying the object, which takes a lock and triggers an
assertion via AssertWaitAllowed(). This crashes the server.

This patch fixes the issue by calling Shutdown() on TABLET_DATA_DELETED
replicas before dereferencing them in TSTabletManager.

A concurrency test is included that spawns threads to start elections
while the tablets are being deleted. This test pretty reliably fails
without the included fix, and is solidly stable with the fix.

Change-Id: I4a0ea46f220a10d4de680fc73d45d14426e0395e
Reviewed-on: http://gerrit.cloudera.org:8080/7883
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/7911
Reviewed-by: Dan Burkert <da...@apache.org>
---
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/tserver/ts_tablet_manager.cc
2 files changed, 59 insertions(+), 5 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4a0ea46f220a10d4de680fc73d45d14426e0395e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR](branch-1.5.x) KUDU-2118. Fully shut down TabletReplica on delete

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-2118. Fully shut down TabletReplica on delete
......................................................................


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4a0ea46f220a10d4de680fc73d45d14426e0395e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No