You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by jd...@apache.org on 2016/02/06 19:42:03 UTC

[4/4] incubator-kudu git commit: KUDU-1324. Fix SEGV in catalog manager handling under-replicated tablet

KUDU-1324. Fix SEGV in catalog manager handling under-replicated tablet

Commit 31278211f1934890e6835c9db164a7dea87d826a introduced some
new logging when starting the 'AsyncAddServer' task in the catalog
manager that tries to send the AddServer RPC for an under-replicated
tablet. However, this can SEGV in the case that the tablet does
not currently have an elected leader.

This crash can be triggered when restarting the master while a tablet
is under-replicated. When it comes back up, the master may
receive the report of the under-replicated tablet. When it tries
to run the AsyncAddServer task, there is no known leader yet (e.g.
because the leader has not yet sent its tablet report), and thus
the task fails immediately and deletes itself. Calling task->description()
then accesses the freed memory and crashes.

An earlier version of this fix tried to fix the issue by keeping a
scoped_refptr to the task. However, this isn't sufficient because
task->description() will crash if there is no known target tablet
server.

In order to fix this regression for the 0.7.0 release, this patch
takes the simplest approach of just changing the log message to
include less detail. A regression test will be included in a later
patch.

Change-Id: I62037fbaa910a1da476a0ac2075afdcdbc460dc8
Reviewed-on: http://gerrit.cloudera.org:8080/2060
Reviewed-by: Jean-Daniel Cryans
Tested-by: Kudu Jenkins
(cherry picked from commit 802d0e4c53f12b4392544ee10dfb530a25812d4f)
Reviewed-on: http://gerrit.cloudera.org:8080/2086
Tested-by: Jean-Daniel Cryans


Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/eef13645
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/eef13645
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/eef13645

Branch: refs/heads/branch-0.7.0
Commit: eef13645f8cf7dbace6c7b1b32bf83756a7204bd
Parents: 08f8a5d
Author: Todd Lipcon <to...@apache.org>
Authored: Thu Feb 4 18:19:12 2016 -0800
Committer: Jean-Daniel Cryans <jd...@gerrit.cloudera.org>
Committed: Sat Feb 6 18:40:35 2016 +0000

----------------------------------------------------------------------
 src/kudu/master/catalog_manager.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/eef13645/src/kudu/master/catalog_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/catalog_manager.cc b/src/kudu/master/catalog_manager.cc
index 000a6a5..b3c0468 100644
--- a/src/kudu/master/catalog_manager.cc
+++ b/src/kudu/master/catalog_manager.cc
@@ -2476,9 +2476,9 @@ void CatalogManager::SendAddServerRequest(const scoped_refptr<TabletInfo>& table
   tablet->table()->AddTask(task);
   WARN_NOT_OK(task->Run(), "Failed to send new AddServer request");
 
-  // Need to print this after Run() because that's where it picks the TS which description()
-  // needs.
-  LOG(INFO) << "Started AddServer task: " << task->description();
+  // We can't access 'task' here because it may delete itself inside Run() in the
+  // case that the tablet has no known leader.
+  LOG(INFO) << "Started AddServer task for tablet " << tablet->tablet_id();
 }
 
 void CatalogManager::ExtractTabletsToProcess(