You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/06/20 23:01:39 UTC

[kudu-CR] KUDU-1863: improve overall safety of graceful server shutdown

Hello David Ribeiro Alves, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7183

to look at the new patch set (#2).

Change subject: KUDU-1863: improve overall safety of graceful server shutdown
......................................................................

KUDU-1863: improve overall safety of graceful server shutdown

What exactly is a "safe" graceful shutdown? I define it like this:
1. Stop accepting new RPCs.
2. Wait for any existing incoming RPCs to finish execution.
3. Shut down server-specific subsystems.
4. Shut down generic subsystems.

By ordering steps 3 and 4 after steps 1 and 2, we can guarantee that there
are no reactor threads running RPC callbacks within the subsystems that we
may be trying to shut down. Unfortunately, adhering to this order is very
difficult today. Step 2 in particular is thorny; we could implement it via
synchronous Messenger::Shutdown, but even that tears down just enough state
that active subsystems can run into trouble.

So for now, we'll settle for a modified form of the above:
1. Stop accepting new RPCs.
2. Shut down server-specific subsystems.
3. Shut down generic subsystems, including waiting for any existing incoming
   RPCs to finish execution.

This patch implements this modified form of a safe shutdown:
- During shutdown, servers disable the acceptance of new RPCs first.
- Servers now shut down master and tserver-specific subsystems before
  generic subsystems.
- Shutting down generic subsystems now explicitly waits for outstanding RPC
  callbacks to finish, via a new synchronous Messenger::Shutdown.

The bulk of the patch is to Messenger::Shutdown and friends. While there, I
also reduced lock acquisition to just critical sections (fixing KUDU-1863 in
the process), extended the life of tls_context_ to Messenger destruction,
and changed Shutdown to be callable with registered services.

Change-Id: Ibbda29dd471e4dc1a8cb4f472cc456ccdb1e48cc
---
M src/kudu/kserver/kserver.cc
M src/kudu/master/master.cc
M src/kudu/rpc/messenger.cc
M src/kudu/rpc/messenger.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/reactor.h
M src/kudu/server/rpc_server.cc
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
M src/kudu/tserver/tablet_server.cc
10 files changed, 166 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/7183/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibbda29dd471e4dc1a8cb4f472cc456ccdb1e48cc
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>