You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by ns...@apache.org on 2011/10/11 04:05:34 UTC

svn commit: r1181402 - /hbase/branches/0.89/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java

Author: nspiegelberg
Date: Tue Oct 11 02:05:34 2011
New Revision: 1181402

URL: http://svn.apache.org/viewvc?rev=1181402&view=rev
Log:
deadlock in RS shutdown sequence when ZK session expires

Summary:
There is a cycle of threads waiting on each other because System.exit(1)
doesn't quite do an instant abort. It invokes the shutdown hooks, and there is a
cycle of threads waiting on each other...

This fix uses Runtime.getRuntime.halt() as a way to exit abrubtly, and avoid
running the shutdown hooks when a region server's ZK session expires.

Test Plan:
On localhost, I ran with pseudo-distributed mode turned on. This ensures that
RS gets spawned as its own process, and also that shutdown hooks are installed.

Inserted a fake System.exit() at a suitable place to confirm that when hit, the
deadlock indeed reproduced. And then changed the code to use
Runtime.getRuntime.halt() and confirmed that the hooks did not get invoked, and
the process didn't get stuck shutting down.

Also running the unit tests now.

DiffCamp Revision: 170412
Reviewed By: kranganathan
CC: kranganathan, hbase@lists
Revert Plan:
OK

Modified:
    hbase/branches/0.89/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java

Modified: hbase/branches/0.89/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.89/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java?rev=1181402&r1=1181401&r2=1181402&view=diff
==============================================================================
--- hbase/branches/0.89/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java (original)
+++ hbase/branches/0.89/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java Tue Oct 11 02:05:34 2011
@@ -1382,6 +1382,12 @@ public class ZooKeeperWrapper implements
 
   private void abort() {
     LOG.fatal("<" + instanceName + "> Aborting process because of fatal ZK error");
-    System.exit(1);
+
+    // Previously, this was System.exit(1). exit() invokes shutdown hooks.
+    // If abort happens in the region servers main worker thread, this can
+    // cause a deadlock in the shutdown sequence.
+    //
+    // When a RS ZK session expires, exit asap. Do not run any shutdown hooks.
+    Runtime.getRuntime().halt(1);
   }
 }