You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@giraph.apache.org by ed...@apache.org on 2016/06/21 17:14:44 UTC

git commit: updated refs/heads/trunk to 2185f59

Repository: giraph
Updated Branches:
  refs/heads/trunk d827c97fc -> 2185f5946


GIRAPH-1076 Race condition in FileTxnSnapLog

Summary:
org.apache.zookeeper.server.persistence.FileTxnSnapLog has a potential for race condition:

    if (!this.dataDir.exists()) {
        if (!this.dataDir.mkdirs()) {
               throw new IOException("Unable to create data directory " + this.dataDir);
        }
    }

If two threads try to create FileTxnSnapLog simultaneously it can trigger IOException.
We saw this happening in Giraph where FileTxnSnapLog is being created by PurgeTask created by DatadirCleanupManager and by InProcessZooKeeperRunner#runFromConfig.
Until and if ever, the zookeeper code is fixed, we need to make sure zookeeper starts first and only then starts PurgeTask.

Test Plan: run a few jobs and mvn clean verify

Reviewers: majakabiljo, dionysis.logothetis, heslami, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D59883


Project: http://git-wip-us.apache.org/repos/asf/giraph/repo
Commit: http://git-wip-us.apache.org/repos/asf/giraph/commit/2185f594
Tree: http://git-wip-us.apache.org/repos/asf/giraph/tree/2185f594
Diff: http://git-wip-us.apache.org/repos/asf/giraph/diff/2185f594

Branch: refs/heads/trunk
Commit: 2185f5946edfddcca8a5bcb76160212bfe2ef797
Parents: d827c97
Author: Sergey Edunov <ed...@fb.com>
Authored: Tue Jun 21 10:14:34 2016 -0700
Committer: Sergey Edunov <ed...@fb.com>
Committed: Tue Jun 21 10:14:34 2016 -0700

----------------------------------------------------------------------
 .../org/apache/giraph/zk/InProcessZooKeeperRunner.java  | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/giraph/blob/2185f594/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
----------------------------------------------------------------------
diff --git a/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java b/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
index 9502c24..4f15f3a 100644
--- a/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
+++ b/giraph-core/src/main/java/org/apache/giraph/zk/InProcessZooKeeperRunner.java
@@ -88,16 +88,22 @@ public class InProcessZooKeeperRunner
      * @throws IOException if can't start zookeeper
      */
     public int start(ZookeeperConfig config) throws IOException {
+      serverRunner = new ZooKeeperServerRunner();
+      //Make sure zookeeper starts first and purge manager last
+      //This is important because zookeeper creates a folder
+      //strucutre on the local disk. Purge manager also tries
+      //to create it but from a different thread and can run into
+      //race condition. See FileTxnSnapLog source code for details.
+      int port = serverRunner.start(config);
       // Start and schedule the the purge task
       DatadirCleanupManager purgeMgr = new DatadirCleanupManager(
           config
-          .getDataDir(), config.getDataLogDir(),
+              .getDataDir(), config.getDataLogDir(),
           GiraphConstants.ZOOKEEPER_SNAP_RETAIN_COUNT,
           GiraphConstants.ZOOKEEPER_PURGE_INTERVAL);
       purgeMgr.start();
 
-      serverRunner = new ZooKeeperServerRunner();
-      return serverRunner.start(config);
+      return port;
     }
 
     /**