You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by lvfangmin <gi...@git.apache.org> on 2018/01/09 20:24:53 UTC

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

GitHub user lvfangmin opened a pull request:

    https://github.com/apache/zookeeper/pull/447

    [ZOOKEEPER-2926] Fix potential data consistency issue due to the session management bug

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lvfangmin/zookeeper ZOOKEEPER-2926

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/447.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #447
    
----
commit 04ec15841c5d164f1212eea9cdfd02e10aa4478e
Author: Fangmin Lyu <al...@...>
Date:   2018-01-09T06:58:32Z

    fix potential data consistency issue due to global session added too early

----


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r161855408
  
    --- Diff: src/java/test/org/apache/zookeeper/test/SessionTrackerCheckTest.java ---
    @@ -185,8 +182,8 @@ public void testLeaderSessionTracker() throws Exception {
                     TICK_TIME, expirer.sid, false, testZKSListener());
     
             // Global session
    -        sessionId = 0xdeadbeef;
    -        tracker.addSession(sessionId, CONNECTION_TIMEOUT);
    +        sessionId = 0xdeadbef0;
    --- End diff --
    
    I think this is not changed by intention, might happen when merge patch, I'll get this fixed.


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    @lvfangmin Have you amended your commit?
    I see a lot of changes since I last reviewed this PR.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205339992
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTracker.java ---
    @@ -47,21 +47,20 @@
         long createSession(int sessionTimeout);
     
         /**
    -     * Add a global session to those being tracked.
    +     * Track the session expire, not add to ZkDb.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
    --- End diff --
    
    This comment should be updated now with the function name change.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r160976160
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/LearnerSessionTracker.java ---
    @@ -102,32 +101,42 @@ public boolean isGlobalSession(long sessionId) {
         }
     
         public boolean addGlobalSession(long sessionId, int sessionTimeout) {
    +        // no global session tracker, do nothing
    +        return false;
    --- End diff --
    
    Is it possible to throw UnsupportedOperationException instead?


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205629046
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/LeaderSessionTracker.java ---
    @@ -85,31 +85,43 @@ public boolean isGlobalSession(long sessionId) {
             return globalSessionTracker.isTrackingSession(sessionId);
         }
     
    -    public boolean addGlobalSession(long sessionId, int sessionTimeout) {
    -        boolean added =
    -            globalSessionTracker.addSession(sessionId, sessionTimeout);
    -        if (localSessionsEnabled && added) {
    +    public boolean trackSession(long sessionId, int sessionTimeout) {
    +        boolean tracked =
    +            globalSessionTracker.trackSession(sessionId, sessionTimeout);
    +        if (localSessionsEnabled && tracked) {
                 // Only do extra logging so we know what kind of session this is
                 // if we're supporting both kinds of sessions
    -            LOG.info("Adding global session 0x" + Long.toHexString(sessionId));
    +            LOG.info("Tracking global session 0x" + Long.toHexString(sessionId));
             }
    -        return added;
    +        return tracked;
         }
     
    -    public boolean addSession(long sessionId, int sessionTimeout) {
    -        boolean added;
    -        if (localSessionsEnabled && !isGlobalSession(sessionId)) {
    -            added = localSessionTracker.addSession(sessionId, sessionTimeout);
    --- End diff --
    
    With this being removed, how would a `LeaderSessionTracker` now add (or track, with the new term) a new local session (when local session is enabled on Leader server)? The `LeaderSessionTracker.trackSession` does not track local session.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r206009813
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTracker.java ---
    @@ -47,21 +47,20 @@
         long createSession(int sessionTimeout);
     
         /**
    -     * Add a global session to those being tracked.
    +     * Track the session expire, not add to ZkDb.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
          */
    -    boolean addGlobalSession(long id, int to);
    +    boolean trackSession(long id, int to);
     
         /**
    -     * Add a session to those being tracked. The session is added as a local
    -     * session if they are enabled, otherwise as global.
    +     * Add the session to the under layer storage.
    --- End diff --
    
    Sounds good to me with updated comment. 


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    @anmolnar I changed the code based on your comments and did amend, is this the right process?


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r160971106
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java ---
    @@ -280,6 +278,12 @@ public synchronized boolean addSession(long id, int sessionTimeout) {
             return added;
         }
     
    +    public synchronized boolean commitSession(long id, int sessionTimeout) {
    --- End diff --
    
    `sessionsWithTimeout` is a Map within ZKdb, so adding sessions to it is equivalent to persisting them.
    
    I think you could also remove the `addGlobalSession()` method completely, because it doesn't do anything special than forwarding the call to `addSession()`.


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    I'll give it a review; I missed the original JIRA. 


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    @anmolnar I also find it's hard to track when using overwriting the commit, I'll use stack instead.
    
    For the tests, I need to inject some failure in the Quorum being set up, that's why I need the customized way I'm doing in my code, I think I prefer to create a separate test class.


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    @anmolnar do you have time to revisit the code here?


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205332227
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/UpgradeableSessionTracker.java ---
    @@ -19,6 +19,8 @@
     
     import java.util.concurrent.ConcurrentHashMap;
     import java.util.concurrent.ConcurrentMap;
    +import java.util.Set;
    +import java.util.HashSet;
    --- End diff --
    
    Are both imports needed? I don't see any new data structure introduced. 


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r206010420
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/LeaderSessionTracker.java ---
    @@ -85,31 +85,43 @@ public boolean isGlobalSession(long sessionId) {
             return globalSessionTracker.isTrackingSession(sessionId);
         }
     
    -    public boolean addGlobalSession(long sessionId, int sessionTimeout) {
    -        boolean added =
    -            globalSessionTracker.addSession(sessionId, sessionTimeout);
    -        if (localSessionsEnabled && added) {
    +    public boolean trackSession(long sessionId, int sessionTimeout) {
    +        boolean tracked =
    +            globalSessionTracker.trackSession(sessionId, sessionTimeout);
    +        if (localSessionsEnabled && tracked) {
                 // Only do extra logging so we know what kind of session this is
                 // if we're supporting both kinds of sessions
    -            LOG.info("Adding global session 0x" + Long.toHexString(sessionId));
    +            LOG.info("Tracking global session 0x" + Long.toHexString(sessionId));
             }
    -        return added;
    +        return tracked;
         }
     
    -    public boolean addSession(long sessionId, int sessionTimeout) {
    -        boolean added;
    -        if (localSessionsEnabled && !isGlobalSession(sessionId)) {
    -            added = localSessionTracker.addSession(sessionId, sessionTimeout);
    --- End diff --
    
    I see now. Basically each implementation of session tracker will maintain either a local session tracker and a global session tracker or both depend on the session tracker type and the tracking of different types of sessions are delegated to the local/global session trackers owned by the actual session tracker implementations. 


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205922808
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java ---
    @@ -280,6 +275,11 @@ public synchronized boolean addSession(long id, int sessionTimeout) {
             return added;
         }
     
    +    public synchronized boolean commitSession(long id, int sessionTimeout) {
    +        sessionsWithTimeout.put(id, sessionTimeout);
    +        return true;
    --- End diff --
    
    The LeaderSessionTracker.commitSession will return whether it has successfully added the new session, but the return value is not being used anywhere in the code currently.
    
    I'll update this to reflect that as well.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205340787
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTracker.java ---
    @@ -47,21 +47,20 @@
         long createSession(int sessionTimeout);
     
         /**
    -     * Add a global session to those being tracked.
    +     * Track the session expire, not add to ZkDb.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
          */
    -    boolean addGlobalSession(long id, int to);
    +    boolean trackSession(long id, int to);
     
         /**
    -     * Add a session to those being tracked. The session is added as a local
    -     * session if they are enabled, otherwise as global.
    +     * Add the session to the under layer storage.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
    --- End diff --
    
    Similarly it'll be great if the comment on the return value here is updated.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zookeeper/pull/447


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205922968
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTracker.java ---
    @@ -47,21 +47,20 @@
         long createSession(int sessionTimeout);
     
         /**
    -     * Add a global session to those being tracked.
    +     * Track the session expire, not add to ZkDb.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
          */
    -    boolean addGlobalSession(long id, int to);
    +    boolean trackSession(long id, int to);
     
         /**
    -     * Add a session to those being tracked. The session is added as a local
    -     * session if they are enabled, otherwise as global.
    +     * Add the session to the under layer storage.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
    --- End diff --
    
    This comment about the return value is still correct


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r160970304
  
    --- Diff: src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java ---
    @@ -579,13 +579,8 @@ protected void pRequest2Txn(int type, long zxid, Request request,
                     int to = request.request.getInt();
                     request.setTxn(new CreateSessionTxn(to));
                     request.request.rewind();
    -                if (request.isLocalSession()) {
    -                    // This will add to local session tracker if it is enabled
    -                    zks.sessionTracker.addSession(request.sessionId, to);
    -                } else {
    -                    // Explicitly add to global session if the flag is not set
    -                    zks.sessionTracker.addGlobalSession(request.sessionId, to);
    -                }
    +                // only add the global session tracker but not to ZKDb
    +                zks.sessionTracker.addGlobalSession(request.sessionId, to);
    --- End diff --
    
    You could remove this check safely, because effectively there was no difference between `addSession()` and `addGlobalSession()` calls.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205331482
  
    --- Diff: src/java/test/org/apache/zookeeper/test/QuorumBase.java ---
    @@ -53,32 +53,32 @@
         protected int port3;
         protected int port4;
         protected int port5;
    -    
    +
         protected int portLE1;
         protected int portLE2;
         protected int portLE3;
         protected int portLE4;
         protected int portLE5;
    -    
    +
         protected int portClient1;
         protected int portClient2;
         protected int portClient3;
         protected int portClient4;
         protected int portClient5;
     
    -    protected boolean localSessionsEnabled = false;
    -    protected boolean localSessionsUpgradingEnabled = false;
    +    public boolean localSessionsEnabled = false;
    +    public boolean localSessionsUpgradingEnabled = false;
    --- End diff --
    
    Not sure why this change is required. 


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    @hanm is the new change looks good to you?


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    Thanks @hanm, I've updated the diff based on your comments and rebased onto latest master code.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205923291
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/LearnerSessionTracker.java ---
    @@ -101,33 +100,44 @@ public boolean isGlobalSession(long sessionId) {
             return globalSessionsWithTimeouts.containsKey(sessionId);
         }
     
    -    public boolean addGlobalSession(long sessionId, int sessionTimeout) {
    +    public boolean trackSession(long sessionId, int sessionTimeout) {
    +        // Learner doesn't track global session, do nothing here
    +        return false;
    +    }
    +
    +    /**
    +     * Synchronized on this to avoid race condition of adding a local session
    +     * after committed global session, which may cause the same session being
    +     * tracked on this server and leader.
    +     */
    +    public synchronized boolean commitSession(
    +            long sessionId, int sessionTimeout) {
             boolean added =
                 globalSessionsWithTimeouts.put(sessionId, sessionTimeout) == null;
    -        if (localSessionsEnabled && added) {
    +
    +        if (added) {
                 // Only do extra logging so we know what kind of session this is
                 // if we're supporting both kinds of sessions
    -            LOG.info("Adding global session 0x" + Long.toHexString(sessionId));
    +            LOG.info("Committing global session 0x" + Long.toHexString(sessionId));
             }
    -        touchTable.get().put(sessionId, sessionTimeout);
    -        return added;
    -    }
     
    -    public boolean addSession(long sessionId, int sessionTimeout) {
    --- End diff --
    
    Explained in the previous comment, createSession will add and track it.


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    @lvfangmin Got it. Recently we talked about why multiple commits are better for review over squashing. Now I see why.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r161410077
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/SessionUpgradeTest.java ---
    @@ -0,0 +1,516 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.zookeeper.server.quorum;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.nio.ByteBuffer;
    +import java.util.ArrayList;
    +import java.util.concurrent.ConcurrentHashMap;
    +import javax.security.sasl.SaslException;
    +
    +import org.apache.jute.BinaryOutputArchive;
    +import org.apache.zookeeper.CreateMode;
    +import org.apache.zookeeper.KeeperException;
    +import org.apache.zookeeper.PortAssignment;
    +import org.apache.zookeeper.ZKTestCase;
    +import org.apache.zookeeper.ZooDefs;
    +import org.apache.zookeeper.ZooKeeper;
    +import org.apache.zookeeper.ZooKeeper.States;
    +import org.apache.zookeeper.data.Id;
    +import org.apache.zookeeper.proto.CreateRequest;
    +import org.apache.zookeeper.test.ClientBase;
    +import org.apache.zookeeper.test.ClientBase.CountdownWatcher;
    +import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
    +import org.apache.zookeeper.server.ByteBufferInputStream;
    +import org.apache.zookeeper.server.Request;
    +import org.junit.After;
    +import org.junit.Assert;
    +import org.junit.Before;
    +import org.junit.Test;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import org.apache.zookeeper.test.QuorumBase;
    +import org.apache.zookeeper.test.DisconnectableZooKeeper;
    +
    +/**
    + * Tests that session upgrade works from local to global sessions.
    + * Expected behavior is that if global-only sessions are unset,
    + * and no upgrade interval is specified, then sessions will be
    + * created locally to the host.  They will be upgraded to global
    + * sessions iff an operation is done on that session which requires
    + * persistence, i.e. creating an ephemeral node.
    + */
    +public class SessionUpgradeTest extends QuorumPeerTestBase {
    +    protected static final Logger LOG = LoggerFactory.getLogger(SessionUpgradeTest.class);
    +    public static final int CONNECTION_TIMEOUT = ClientBase.CONNECTION_TIMEOUT;
    +
    +    private final QuorumBase qb = new QuorumBase();
    +
    +    @Before
    +    public void setUp() throws Exception {
    +        LOG.info("STARTING quorum " + getClass().getName());
    +        qb.localSessionsEnabled = true;
    +        qb.localSessionsUpgradingEnabled = true;
    +        qb.setUp();
    +        ClientBase.waitForServerUp(qb.hostPort, 10000);
    +    }
    +
    +    @After
    +    public void tearDown() throws Exception {
    +        LOG.info("STOPPING quorum " + getClass().getName());
    +        qb.tearDown();
    +    }
    +
    +    @Test
    +    public void testLocalSessionsWithoutEphemeralOnFollower() throws Exception {
    +        testLocalSessionsWithoutEphemeral(false);
    +    }
    +
    +    @Test
    +    public void testLocalSessionsWithoutEphemeralOnLeader() throws Exception {
    +        testLocalSessionsWithoutEphemeral(true);
    +    }
    +
    +    private void testLocalSessionsWithoutEphemeral(boolean testLeader)
    +            throws Exception {
    +        String nodePrefix = "/testLocalSessions-"
    +            + (testLeader ? "leaderTest-" : "followerTest-");
    +        int leaderIdx = qb.getLeaderIndex();
    +        Assert.assertFalse("No leader in quorum?", leaderIdx == -1);
    +        int followerIdx = (leaderIdx + 1) % 5;
    +        int otherFollowerIdx = (leaderIdx + 2) % 5;
    +        int testPeerIdx = testLeader ? leaderIdx : followerIdx;
    +        String hostPorts[] = qb.hostPort.split(",");
    +        CountdownWatcher watcher = new CountdownWatcher();
    +        DisconnectableZooKeeper zk = new DisconnectableZooKeeper(
    +                hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +
    +        // Try creating some data.
    +        for (int i = 0; i < 5; i++) {
    +            zk.create(nodePrefix + i, new byte[0],
    +                      ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    +        }
    +
    +        long localSessionId = zk.getSessionId();
    +        byte[] localSessionPwd = zk.getSessionPasswd().clone();
    +
    +        // Try connecting with the same session id on a different
    +        // server.  This should fail since it is a local sesion.
    +        try {
    +            watcher.reset();
    +            DisconnectableZooKeeper zknew = new DisconnectableZooKeeper(
    +                    hostPorts[otherFollowerIdx], CONNECTION_TIMEOUT, watcher,
    +                    localSessionId, localSessionPwd);
    +
    +            zknew.create(nodePrefix + "5", new byte[0],
    +                         ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    +            Assert.fail("Connection on the same session ID should fail.");
    +        } catch (KeeperException.SessionExpiredException e) {
    +        } catch (KeeperException.ConnectionLossException e) {
    +        }
    +
    +        // If we're testing a follower, also check the session id on the
    +        // leader. This should also fail
    +        if (!testLeader) {
    +            try {
    +                watcher.reset();
    +                DisconnectableZooKeeper zknew = new DisconnectableZooKeeper(
    +                        hostPorts[leaderIdx], CONNECTION_TIMEOUT,
    +                        watcher, localSessionId, localSessionPwd);
    +
    +                zknew.create(nodePrefix + "5", new byte[0],
    +                             ZooDefs.Ids.OPEN_ACL_UNSAFE,
    +                             CreateMode.PERSISTENT);
    +                Assert.fail("Connection on the same session ID should fail.");
    +            } catch (KeeperException.SessionExpiredException e) {
    +            } catch (KeeperException.ConnectionLossException e) {
    +            }
    +        }
    +
    +        // However, we should be able to disconnect and reconnect to the same
    +        // server with the same session id (as long as we do it quickly
    +        // before expiration).
    +        zk.disconnect();
    +
    +        watcher.reset();
    +        zk = new DisconnectableZooKeeper(
    +                hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher,
    +                localSessionId, localSessionPwd);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +
    +        zk.create(nodePrefix + "6", new byte[0],
    +                  ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    +
    +        // If we explicitly close the session, then the session id should no
    +        // longer be valid.
    +        zk.close();
    +        try {
    +            watcher.reset();
    +            zk = new DisconnectableZooKeeper(
    +                    hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher,
    +                    localSessionId, localSessionPwd);
    +
    +            zk.create(nodePrefix + "7", new byte[0],
    +                      ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    +            Assert.fail("Reconnecting to a closed session ID should fail.");
    +        } catch (KeeperException.SessionExpiredException e) {
    +        }
    +    }
    +
    +    @Test
    +    public void testUpgradeWithEphemeralOnFollower() throws Exception {
    +        testUpgradeWithEphemeral(false);
    +    }
    +
    +    @Test
    +    public void testUpgradeWithEphemeralOnLeader() throws Exception {
    +        testUpgradeWithEphemeral(true);
    +    }
    +
    +    private void testUpgradeWithEphemeral(boolean testLeader)
    +            throws Exception {
    +        String nodePrefix = "/testUpgrade-"
    +            + (testLeader ? "leaderTest-" : "followerTest-");
    +        int leaderIdx = qb.getLeaderIndex();
    +        Assert.assertFalse("No leader in quorum?", leaderIdx == -1);
    +        int followerIdx = (leaderIdx + 1) % 5;
    +        int otherFollowerIdx = (leaderIdx + 2) % 5;
    +        int testPeerIdx = testLeader ? leaderIdx : followerIdx;
    +        String hostPorts[] = qb.hostPort.split(",");
    +
    +        CountdownWatcher watcher = new CountdownWatcher();
    +        DisconnectableZooKeeper zk = new DisconnectableZooKeeper(
    +                hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +
    +        // Create some ephemeral nodes.  This should force the session to
    +        // be propagated to the other servers in the ensemble.
    +        for (int i = 0; i < 5; i++) {
    +            zk.create(nodePrefix + i, new byte[0],
    +                      ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
    +        }
    +
    +        // We should be able to reconnect with the same session id on a
    +        // different server, since it has been propagated.
    +        long localSessionId = zk.getSessionId();
    +        byte[] localSessionPwd = zk.getSessionPasswd().clone();
    +
    +        zk.disconnect();
    +        watcher.reset();
    +        zk = new DisconnectableZooKeeper(
    +                hostPorts[otherFollowerIdx], CONNECTION_TIMEOUT, watcher,
    +                localSessionId, localSessionPwd);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +
    +        // The created ephemeral nodes are still around.
    +        for (int i = 0; i < 5; i++) {
    +            Assert.assertNotNull(zk.exists(nodePrefix + i, null));
    +        }
    +
    +        // When we explicitly close the session, we should not be able to
    +        // reconnect with the same session id
    +        zk.close();
    +
    +        try {
    +            watcher.reset();
    +            zk = new DisconnectableZooKeeper(
    +                    hostPorts[otherFollowerIdx], CONNECTION_TIMEOUT, watcher,
    +                    localSessionId, localSessionPwd);
    +            zk.exists(nodePrefix + "0", null);
    +            Assert.fail("Reconnecting to a closed session ID should fail.");
    +        } catch (KeeperException.SessionExpiredException e) {
    +        }
    +
    +        watcher.reset();
    +        // And the ephemeral nodes will be gone since the session died.
    +        zk = new DisconnectableZooKeeper(
    +                hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +        for (int i = 0; i < 5; i++) {
    +            Assert.assertNull(zk.exists(nodePrefix + i, null));
    +        }
    +    }
    +
    +    @Test
    +    public void testLocalSessionUpgradeSnapshot() throws IOException, InterruptedException {
    +        // setup the env with RetainDB and local session upgrading
    +        ClientBase.setupTestEnv();
    +
    +        final int SERVER_COUNT = 3;
    +        final int clientPorts[] = new int[SERVER_COUNT];
    +        StringBuilder sb = new StringBuilder();
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            clientPorts[i] = PortAssignment.unique();
    +            sb.append("server.").append(i).append("=127.0.0.1:")
    +              .append(PortAssignment.unique()).append(":")
    +              .append(PortAssignment.unique()).append("\n");
    +        }
    +        sb.append("localSessionsEnabled=true\n");
    +        sb.append("localSessionsUpgradingEnabled=true\n");
    +        String cfg = sb.toString();
    +
    +        // create a 3 server ensemble
    +        MainThread mt[] = new MainThread[SERVER_COUNT];
    +        final TestQPMainDropSessionUpgrading qpMain[] =
    +                new TestQPMainDropSessionUpgrading[SERVER_COUNT];
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            final TestQPMainDropSessionUpgrading qp = new TestQPMainDropSessionUpgrading();
    +            qpMain[i] = qp;
    +            mt[i] = new MainThread(i, clientPorts[i], cfg, false) {
    +                @Override
    +                public TestQPMain getTestQPMain() {
    +                    return qp;
    +                }
    +            };
    +            mt[i].start();
    +        }
    +
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            Assert.assertTrue("waiting for server " + i + " being up",
    +                    ClientBase.waitForServerUp("127.0.0.1:" + clientPorts[i],
    +                            CONNECTION_TIMEOUT));
    +        }
    --- End diff --
    
    Test initialization logic is better placed in setUp() method especially when it's common in multiple tests.
    (That would be one benefit of moving new tests to new file)


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205922455
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java ---
    @@ -280,6 +275,11 @@ public synchronized boolean addSession(long id, int sessionTimeout) {
             return added;
         }
     
    +    public synchronized boolean commitSession(long id, int sessionTimeout) {
    +        sessionsWithTimeout.put(id, sessionTimeout);
    +        return true;
    --- End diff --
    
    The LeaderSessionTracker.commitSession will return whether it has successfully added the new session.


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    Logic looks good to me. Summarize the change in one sentence: moving global session commit from pre processor to final processor so a global session will not be applied to zkDB until upgrade finished (global session creation committed to quorum). 


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r161300774
  
    --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java ---
    @@ -1187,13 +1187,8 @@ private ProcessTxnResult processTxn(Request request, TxnHeader hdr,
             if (opCode == OpCode.createSession) {
                 if (hdr != null && txn instanceof CreateSessionTxn) {
                     CreateSessionTxn cst = (CreateSessionTxn) txn;
    -                sessionTracker.addGlobalSession(sessionId, cst.getTimeOut());
    -            } else if (request != null && request.isLocalSession()) {
    -                request.request.rewind();
    -                int timeout = request.request.getInt();
    -                request.request.rewind();
    -                sessionTracker.addSession(request.sessionId, timeout);
    -            } else {
    +                sessionTracker.commitSession(sessionId, cst.getTimeOut());
    +            } else if (request == null || !request.isLocalSession()) {
    --- End diff --
    
    Local session creation is happened in ZooKeeperServer when the when processing connection request, PrepRequestProcessor only deals with the global sessions.
    
    Local session creation will also send a createSession op into the processor pipeline, so if we remove the else check, a lot of warning log will show for local session createSession op, that's not what we wanted.


---

[GitHub] zookeeper issue #447: [ZOOKEEPER-2926] Fix potential data consistency issue ...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on the issue:

    https://github.com/apache/zookeeper/pull/447
  
    Thanks @anmolnar for review.
    
    @hanm please take a look when you have time.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r161405611
  
    --- Diff: src/java/test/org/apache/zookeeper/test/SessionTrackerCheckTest.java ---
    @@ -185,8 +182,8 @@ public void testLeaderSessionTracker() throws Exception {
                     TICK_TIME, expirer.sid, false, testZKSListener());
     
             // Global session
    -        sessionId = 0xdeadbeef;
    -        tracker.addSession(sessionId, CONNECTION_TIMEOUT);
    +        sessionId = 0xdeadbef0;
    --- End diff --
    
    Why have you changed this? :)


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r161410084
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/SessionUpgradeTest.java ---
    @@ -0,0 +1,516 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.zookeeper.server.quorum;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.nio.ByteBuffer;
    +import java.util.ArrayList;
    +import java.util.concurrent.ConcurrentHashMap;
    +import javax.security.sasl.SaslException;
    +
    +import org.apache.jute.BinaryOutputArchive;
    +import org.apache.zookeeper.CreateMode;
    +import org.apache.zookeeper.KeeperException;
    +import org.apache.zookeeper.PortAssignment;
    +import org.apache.zookeeper.ZKTestCase;
    +import org.apache.zookeeper.ZooDefs;
    +import org.apache.zookeeper.ZooKeeper;
    +import org.apache.zookeeper.ZooKeeper.States;
    +import org.apache.zookeeper.data.Id;
    +import org.apache.zookeeper.proto.CreateRequest;
    +import org.apache.zookeeper.test.ClientBase;
    +import org.apache.zookeeper.test.ClientBase.CountdownWatcher;
    +import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
    +import org.apache.zookeeper.server.ByteBufferInputStream;
    +import org.apache.zookeeper.server.Request;
    +import org.junit.After;
    +import org.junit.Assert;
    +import org.junit.Before;
    +import org.junit.Test;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import org.apache.zookeeper.test.QuorumBase;
    +import org.apache.zookeeper.test.DisconnectableZooKeeper;
    +
    +/**
    + * Tests that session upgrade works from local to global sessions.
    + * Expected behavior is that if global-only sessions are unset,
    + * and no upgrade interval is specified, then sessions will be
    + * created locally to the host.  They will be upgraded to global
    + * sessions iff an operation is done on that session which requires
    + * persistence, i.e. creating an ephemeral node.
    + */
    +public class SessionUpgradeTest extends QuorumPeerTestBase {
    +    protected static final Logger LOG = LoggerFactory.getLogger(SessionUpgradeTest.class);
    +    public static final int CONNECTION_TIMEOUT = ClientBase.CONNECTION_TIMEOUT;
    +
    +    private final QuorumBase qb = new QuorumBase();
    +
    +    @Before
    +    public void setUp() throws Exception {
    +        LOG.info("STARTING quorum " + getClass().getName());
    +        qb.localSessionsEnabled = true;
    +        qb.localSessionsUpgradingEnabled = true;
    +        qb.setUp();
    +        ClientBase.waitForServerUp(qb.hostPort, 10000);
    +    }
    +
    +    @After
    +    public void tearDown() throws Exception {
    +        LOG.info("STOPPING quorum " + getClass().getName());
    +        qb.tearDown();
    +    }
    +
    +    @Test
    +    public void testLocalSessionsWithoutEphemeralOnFollower() throws Exception {
    +        testLocalSessionsWithoutEphemeral(false);
    +    }
    +
    +    @Test
    +    public void testLocalSessionsWithoutEphemeralOnLeader() throws Exception {
    +        testLocalSessionsWithoutEphemeral(true);
    +    }
    +
    +    private void testLocalSessionsWithoutEphemeral(boolean testLeader)
    +            throws Exception {
    +        String nodePrefix = "/testLocalSessions-"
    +            + (testLeader ? "leaderTest-" : "followerTest-");
    +        int leaderIdx = qb.getLeaderIndex();
    +        Assert.assertFalse("No leader in quorum?", leaderIdx == -1);
    +        int followerIdx = (leaderIdx + 1) % 5;
    +        int otherFollowerIdx = (leaderIdx + 2) % 5;
    +        int testPeerIdx = testLeader ? leaderIdx : followerIdx;
    +        String hostPorts[] = qb.hostPort.split(",");
    +        CountdownWatcher watcher = new CountdownWatcher();
    +        DisconnectableZooKeeper zk = new DisconnectableZooKeeper(
    +                hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +
    +        // Try creating some data.
    +        for (int i = 0; i < 5; i++) {
    +            zk.create(nodePrefix + i, new byte[0],
    +                      ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    +        }
    +
    +        long localSessionId = zk.getSessionId();
    +        byte[] localSessionPwd = zk.getSessionPasswd().clone();
    +
    +        // Try connecting with the same session id on a different
    +        // server.  This should fail since it is a local sesion.
    +        try {
    +            watcher.reset();
    +            DisconnectableZooKeeper zknew = new DisconnectableZooKeeper(
    +                    hostPorts[otherFollowerIdx], CONNECTION_TIMEOUT, watcher,
    +                    localSessionId, localSessionPwd);
    +
    +            zknew.create(nodePrefix + "5", new byte[0],
    +                         ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    +            Assert.fail("Connection on the same session ID should fail.");
    +        } catch (KeeperException.SessionExpiredException e) {
    +        } catch (KeeperException.ConnectionLossException e) {
    +        }
    +
    +        // If we're testing a follower, also check the session id on the
    +        // leader. This should also fail
    +        if (!testLeader) {
    +            try {
    +                watcher.reset();
    +                DisconnectableZooKeeper zknew = new DisconnectableZooKeeper(
    +                        hostPorts[leaderIdx], CONNECTION_TIMEOUT,
    +                        watcher, localSessionId, localSessionPwd);
    +
    +                zknew.create(nodePrefix + "5", new byte[0],
    +                             ZooDefs.Ids.OPEN_ACL_UNSAFE,
    +                             CreateMode.PERSISTENT);
    +                Assert.fail("Connection on the same session ID should fail.");
    +            } catch (KeeperException.SessionExpiredException e) {
    +            } catch (KeeperException.ConnectionLossException e) {
    +            }
    +        }
    +
    +        // However, we should be able to disconnect and reconnect to the same
    +        // server with the same session id (as long as we do it quickly
    +        // before expiration).
    +        zk.disconnect();
    +
    +        watcher.reset();
    +        zk = new DisconnectableZooKeeper(
    +                hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher,
    +                localSessionId, localSessionPwd);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +
    +        zk.create(nodePrefix + "6", new byte[0],
    +                  ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    +
    +        // If we explicitly close the session, then the session id should no
    +        // longer be valid.
    +        zk.close();
    +        try {
    +            watcher.reset();
    +            zk = new DisconnectableZooKeeper(
    +                    hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher,
    +                    localSessionId, localSessionPwd);
    +
    +            zk.create(nodePrefix + "7", new byte[0],
    +                      ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    +            Assert.fail("Reconnecting to a closed session ID should fail.");
    +        } catch (KeeperException.SessionExpiredException e) {
    +        }
    +    }
    +
    +    @Test
    +    public void testUpgradeWithEphemeralOnFollower() throws Exception {
    +        testUpgradeWithEphemeral(false);
    +    }
    +
    +    @Test
    +    public void testUpgradeWithEphemeralOnLeader() throws Exception {
    +        testUpgradeWithEphemeral(true);
    +    }
    +
    +    private void testUpgradeWithEphemeral(boolean testLeader)
    +            throws Exception {
    +        String nodePrefix = "/testUpgrade-"
    +            + (testLeader ? "leaderTest-" : "followerTest-");
    +        int leaderIdx = qb.getLeaderIndex();
    +        Assert.assertFalse("No leader in quorum?", leaderIdx == -1);
    +        int followerIdx = (leaderIdx + 1) % 5;
    +        int otherFollowerIdx = (leaderIdx + 2) % 5;
    +        int testPeerIdx = testLeader ? leaderIdx : followerIdx;
    +        String hostPorts[] = qb.hostPort.split(",");
    +
    +        CountdownWatcher watcher = new CountdownWatcher();
    +        DisconnectableZooKeeper zk = new DisconnectableZooKeeper(
    +                hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +
    +        // Create some ephemeral nodes.  This should force the session to
    +        // be propagated to the other servers in the ensemble.
    +        for (int i = 0; i < 5; i++) {
    +            zk.create(nodePrefix + i, new byte[0],
    +                      ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
    +        }
    +
    +        // We should be able to reconnect with the same session id on a
    +        // different server, since it has been propagated.
    +        long localSessionId = zk.getSessionId();
    +        byte[] localSessionPwd = zk.getSessionPasswd().clone();
    +
    +        zk.disconnect();
    +        watcher.reset();
    +        zk = new DisconnectableZooKeeper(
    +                hostPorts[otherFollowerIdx], CONNECTION_TIMEOUT, watcher,
    +                localSessionId, localSessionPwd);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +
    +        // The created ephemeral nodes are still around.
    +        for (int i = 0; i < 5; i++) {
    +            Assert.assertNotNull(zk.exists(nodePrefix + i, null));
    +        }
    +
    +        // When we explicitly close the session, we should not be able to
    +        // reconnect with the same session id
    +        zk.close();
    +
    +        try {
    +            watcher.reset();
    +            zk = new DisconnectableZooKeeper(
    +                    hostPorts[otherFollowerIdx], CONNECTION_TIMEOUT, watcher,
    +                    localSessionId, localSessionPwd);
    +            zk.exists(nodePrefix + "0", null);
    +            Assert.fail("Reconnecting to a closed session ID should fail.");
    +        } catch (KeeperException.SessionExpiredException e) {
    +        }
    +
    +        watcher.reset();
    +        // And the ephemeral nodes will be gone since the session died.
    +        zk = new DisconnectableZooKeeper(
    +                hostPorts[testPeerIdx], CONNECTION_TIMEOUT, watcher);
    +        watcher.waitForConnected(CONNECTION_TIMEOUT);
    +        for (int i = 0; i < 5; i++) {
    +            Assert.assertNull(zk.exists(nodePrefix + i, null));
    +        }
    +    }
    +
    +    @Test
    +    public void testLocalSessionUpgradeSnapshot() throws IOException, InterruptedException {
    +        // setup the env with RetainDB and local session upgrading
    +        ClientBase.setupTestEnv();
    +
    +        final int SERVER_COUNT = 3;
    +        final int clientPorts[] = new int[SERVER_COUNT];
    +        StringBuilder sb = new StringBuilder();
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            clientPorts[i] = PortAssignment.unique();
    +            sb.append("server.").append(i).append("=127.0.0.1:")
    +              .append(PortAssignment.unique()).append(":")
    +              .append(PortAssignment.unique()).append("\n");
    +        }
    +        sb.append("localSessionsEnabled=true\n");
    +        sb.append("localSessionsUpgradingEnabled=true\n");
    +        String cfg = sb.toString();
    +
    +        // create a 3 server ensemble
    +        MainThread mt[] = new MainThread[SERVER_COUNT];
    +        final TestQPMainDropSessionUpgrading qpMain[] =
    +                new TestQPMainDropSessionUpgrading[SERVER_COUNT];
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            final TestQPMainDropSessionUpgrading qp = new TestQPMainDropSessionUpgrading();
    +            qpMain[i] = qp;
    +            mt[i] = new MainThread(i, clientPorts[i], cfg, false) {
    +                @Override
    +                public TestQPMain getTestQPMain() {
    +                    return qp;
    +                }
    +            };
    +            mt[i].start();
    +        }
    +
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            Assert.assertTrue("waiting for server " + i + " being up",
    +                    ClientBase.waitForServerUp("127.0.0.1:" + clientPorts[i],
    +                            CONNECTION_TIMEOUT));
    +        }
    +
    +        // select the candidate of follower
    +        int leader = -1;
    +        int followerA = -1;
    +        for (int i = SERVER_COUNT - 1; i >= 0; i--) {
    +            if (mt[i].main.quorumPeer.leader != null) {
    +                leader = i;
    +            } else if (followerA == -1) {
    +                followerA = i;
    +            }
    +        }
    +
    +        LOG.info("follower A is {}", followerA);
    +        qpMain[followerA].setDropCreateSession(true);
    +
    +        // create a client, and create an ephemeral node to trigger the
    +        // upgrading process
    +        final String node = "/node-1";
    +        ZooKeeper zk = new ZooKeeper("127.0.0.1:" + clientPorts[followerA],
    +                    ClientBase.CONNECTION_TIMEOUT, this);
    +
    +        waitForOne(zk, States.CONNECTED);
    +
    +        // clone the session id and passwd for later usage
    +        long sessionId = zk.getSessionId();
    +
    +        // should fail because of the injection
    +        try {
    +            zk.create(node, new byte[2], ZooDefs.Ids.OPEN_ACL_UNSAFE,
    +                    CreateMode.EPHEMERAL);
    +            Assert.fail("expect to failed to upgrade session due to the " +
    +                    "TestQPMainDropSessionUpgrading is being used");
    +        } catch (KeeperException e) {
    +            LOG.info("KeeperException when create ephemeral node, {}", e);
    +        }
    +
    +        // force to take snapshot
    +        qpMain[followerA].quorumPeer.follower.zk.takeSnapshot(true);
    +
    +        // wait snapshot finish
    +        Thread.sleep(500);
    +
    +        // shutdown all servers
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            mt[i].shutdown();
    +        }
    +
    +        ArrayList<States> waitStates =new ArrayList<States>();
    +        waitStates.add(States.CONNECTING);
    +        waitStates.add(States.CLOSED);
    +        waitForOne(zk, waitStates);
    +
    +        // start the servers again, start follower A last as we want to
    +        // keep it running as follower
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            mt[i].start();
    +        }
    +
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            Assert.assertTrue("waiting for server " + i + " being up",
    +                    ClientBase.waitForServerUp("127.0.0.1:" + clientPorts[i],
    +                            CONNECTION_TIMEOUT));
    +        }
    +
    +        // check global session not exist on follower A
    +        for (int i = 0; i < SERVER_COUNT; i++) {
    +            ConcurrentHashMap<Long, Integer> sessions =
    +                    mt[i].main.quorumPeer.getZkDb().getSessionWithTimeOuts();
    +            Assert.assertFalse("server " + i + " should not have global " +
    +                    "session " + sessionId, sessions.containsKey(sessionId));
    +        }
    +
    +        // clean al the setups and close the zk
    --- End diff --
    
    Similiarly cleanup logic should be moved to tearDown().


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r161296695
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java ---
    @@ -280,6 +278,12 @@ public synchronized boolean addSession(long id, int sessionTimeout) {
             return added;
         }
     
    +    public synchronized boolean commitSession(long id, int sessionTimeout) {
    --- End diff --
    
    Yes, that's kind of commit changes to ZkDB, like the usual txns commit to DataTree in ZkDB.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r160972325
  
    --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java ---
    @@ -1187,13 +1187,8 @@ private ProcessTxnResult processTxn(Request request, TxnHeader hdr,
             if (opCode == OpCode.createSession) {
                 if (hdr != null && txn instanceof CreateSessionTxn) {
                     CreateSessionTxn cst = (CreateSessionTxn) txn;
    -                sessionTracker.addGlobalSession(sessionId, cst.getTimeOut());
    -            } else if (request != null && request.isLocalSession()) {
    -                request.request.rewind();
    -                int timeout = request.request.getInt();
    -                request.request.rewind();
    -                sessionTracker.addSession(request.sessionId, timeout);
    -            } else {
    +                sessionTracker.commitSession(sessionId, cst.getTimeOut());
    +            } else if (request == null || !request.isLocalSession()) {
    --- End diff --
    
    Do you need this check here?
    Basically you removed the part of requesting localSession creation, because it has already happened in PrepRequestProcessor and it's safe to leave the default 'else' handler as it was.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205922234
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/LeaderSessionTracker.java ---
    @@ -85,31 +85,43 @@ public boolean isGlobalSession(long sessionId) {
             return globalSessionTracker.isTrackingSession(sessionId);
         }
     
    -    public boolean addGlobalSession(long sessionId, int sessionTimeout) {
    -        boolean added =
    -            globalSessionTracker.addSession(sessionId, sessionTimeout);
    -        if (localSessionsEnabled && added) {
    +    public boolean trackSession(long sessionId, int sessionTimeout) {
    +        boolean tracked =
    +            globalSessionTracker.trackSession(sessionId, sessionTimeout);
    +        if (localSessionsEnabled && tracked) {
                 // Only do extra logging so we know what kind of session this is
                 // if we're supporting both kinds of sessions
    -            LOG.info("Adding global session 0x" + Long.toHexString(sessionId));
    +            LOG.info("Tracking global session 0x" + Long.toHexString(sessionId));
             }
    -        return added;
    +        return tracked;
         }
     
    -    public boolean addSession(long sessionId, int sessionTimeout) {
    -        boolean added;
    -        if (localSessionsEnabled && !isGlobalSession(sessionId)) {
    -            added = localSessionTracker.addSession(sessionId, sessionTimeout);
    --- End diff --
    
    When local session feature is enabled, the createSession method in ZooKeeperServer will create, track and 'commit' (update the local session in memory map) the local session immediately.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205921085
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/UpgradeableSessionTracker.java ---
    @@ -19,6 +19,8 @@
     
     import java.util.concurrent.ConcurrentHashMap;
     import java.util.concurrent.ConcurrentMap;
    +import java.util.Set;
    +import java.util.HashSet;
    --- End diff --
    
    Yes, dangling import after implementation change, will remove. 


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205331145
  
    --- Diff: src/java/test/org/apache/zookeeper/test/ClientBase.java ---
    @@ -72,7 +72,7 @@
         static final File BASETEST =
             new File(System.getProperty("build.test.dir", "build"));
     
    -    protected String hostPort = "127.0.0.1:" + PortAssignment.unique();
    +    public String hostPort = "127.0.0.1:" + PortAssignment.unique();
    --- End diff --
    
    I am not sure why this change is required. 


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205636550
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTracker.java ---
    @@ -47,21 +47,20 @@
         long createSession(int sessionTimeout);
     
         /**
    -     * Add a global session to those being tracked.
    +     * Track the session expire, not add to ZkDb.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
          */
    -    boolean addGlobalSession(long id, int to);
    +    boolean trackSession(long id, int to);
    --- End diff --
    
    Is `trackSession` now only tracking global sessions? 


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205922937
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/LeaderSessionTracker.java ---
    @@ -85,31 +85,43 @@ public boolean isGlobalSession(long sessionId) {
             return globalSessionTracker.isTrackingSession(sessionId);
         }
     
    -    public boolean addGlobalSession(long sessionId, int sessionTimeout) {
    -        boolean added =
    -            globalSessionTracker.addSession(sessionId, sessionTimeout);
    -        if (localSessionsEnabled && added) {
    +    public boolean trackSession(long sessionId, int sessionTimeout) {
    +        boolean tracked =
    +            globalSessionTracker.trackSession(sessionId, sessionTimeout);
    +        if (localSessionsEnabled && tracked) {
                 // Only do extra logging so we know what kind of session this is
                 // if we're supporting both kinds of sessions
    -            LOG.info("Adding global session 0x" + Long.toHexString(sessionId));
    +            LOG.info("Tracking global session 0x" + Long.toHexString(sessionId));
             }
    -        return added;
    +        return tracked;
         }
     
    -    public boolean addSession(long sessionId, int sessionTimeout) {
    -        boolean added;
    -        if (localSessionsEnabled && !isGlobalSession(sessionId)) {
    -            added = localSessionTracker.addSession(sessionId, sessionTimeout);
    --- End diff --
    
    When local session feature is enabled, the createSession method in ZooKeeperServer will create, track and 'commit' (update the local session in memory map) the local session immediately.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r161296493
  
    --- Diff: src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java ---
    @@ -579,13 +579,8 @@ protected void pRequest2Txn(int type, long zxid, Request request,
                     int to = request.request.getInt();
                     request.setTxn(new CreateSessionTxn(to));
                     request.request.rewind();
    -                if (request.isLocalSession()) {
    -                    // This will add to local session tracker if it is enabled
    -                    zks.sessionTracker.addSession(request.sessionId, to);
    -                } else {
    -                    // Explicitly add to global session if the flag is not set
    -                    zks.sessionTracker.addGlobalSession(request.sessionId, to);
    -                }
    +                // only add the global session tracker but not to ZKDb
    +                zks.sessionTracker.addGlobalSession(request.sessionId, to);
    --- End diff --
    
    On LeaderSessionTracker, we need to differentiate add local session or global session, but since we only call createSession when add local session, I think I can simplify the interface more by removing addGlobalSession and rename addSession to trackSession.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205921489
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTracker.java ---
    @@ -47,21 +47,20 @@
         long createSession(int sessionTimeout);
     
         /**
    -     * Add a global session to those being tracked.
    +     * Track the session expire, not add to ZkDb.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
          */
    -    boolean addGlobalSession(long id, int to);
    +    boolean trackSession(long id, int to);
     
         /**
    -     * Add a session to those being tracked. The session is added as a local
    -     * session if they are enabled, otherwise as global.
    +     * Add the session to the under layer storage.
    --- End diff --
    
    In LocalSessionTracker, commitSession is used to update the in memory local session map, which is not in zkDB. How about change it to:
    
    "Add the session to the local session map or global one in zkDB."


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205920948
  
    --- Diff: src/java/test/org/apache/zookeeper/test/QuorumBase.java ---
    @@ -53,32 +53,32 @@
         protected int port3;
         protected int port4;
         protected int port5;
    -    
    +
         protected int portLE1;
         protected int portLE2;
         protected int portLE3;
         protected int portLE4;
         protected int portLE5;
    -    
    +
         protected int portClient1;
         protected int portClient2;
         protected int portClient3;
         protected int portClient4;
         protected int portClient5;
     
    -    protected boolean localSessionsEnabled = false;
    -    protected boolean localSessionsUpgradingEnabled = false;
    +    public boolean localSessionsEnabled = false;
    +    public boolean localSessionsUpgradingEnabled = false;
    --- End diff --
    
    Same reason, will check and remove it if it's not required.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205920893
  
    --- Diff: src/java/test/org/apache/zookeeper/test/ClientBase.java ---
    @@ -72,7 +72,7 @@
         static final File BASETEST =
             new File(System.getProperty("build.test.dir", "build"));
     
    -    protected String hostPort = "127.0.0.1:" + PortAssignment.unique();
    +    public String hostPort = "127.0.0.1:" + PortAssignment.unique();
    --- End diff --
    
    Was trying to change these settings to public so I can reference it in the SessionUpgradeQuorumTest.java, I guess I find a different way to to that and forgot to revert this change, will verify and remove this change if it's not necessary.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205636796
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/LearnerSessionTracker.java ---
    @@ -101,33 +100,44 @@ public boolean isGlobalSession(long sessionId) {
             return globalSessionsWithTimeouts.containsKey(sessionId);
         }
     
    -    public boolean addGlobalSession(long sessionId, int sessionTimeout) {
    +    public boolean trackSession(long sessionId, int sessionTimeout) {
    +        // Learner doesn't track global session, do nothing here
    +        return false;
    +    }
    +
    +    /**
    +     * Synchronized on this to avoid race condition of adding a local session
    +     * after committed global session, which may cause the same session being
    +     * tracked on this server and leader.
    +     */
    +    public synchronized boolean commitSession(
    +            long sessionId, int sessionTimeout) {
             boolean added =
                 globalSessionsWithTimeouts.put(sessionId, sessionTimeout) == null;
    -        if (localSessionsEnabled && added) {
    +
    +        if (added) {
                 // Only do extra logging so we know what kind of session this is
                 // if we're supporting both kinds of sessions
    -            LOG.info("Adding global session 0x" + Long.toHexString(sessionId));
    +            LOG.info("Committing global session 0x" + Long.toHexString(sessionId));
             }
    -        touchTable.get().put(sessionId, sessionTimeout);
    -        return added;
    -    }
     
    -    public boolean addSession(long sessionId, int sessionTimeout) {
    --- End diff --
    
    Here a similar question as asked in `LeaderSessionTracker`: with this being removed how would `LearnerSessionTracker` track a local session? The `trackSession` here as commented, just return false and I don't see any way of adding a local session to `LearnerSessionTracker`.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205340493
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTracker.java ---
    @@ -47,21 +47,20 @@
         long createSession(int sessionTimeout);
     
         /**
    -     * Add a global session to those being tracked.
    +     * Track the session expire, not add to ZkDb.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
          */
    -    boolean addGlobalSession(long id, int to);
    +    boolean trackSession(long id, int to);
     
         /**
    -     * Add a session to those being tracked. The session is added as a local
    -     * session if they are enabled, otherwise as global.
    +     * Add the session to the under layer storage.
    --- End diff --
    
    I think the comment here is a little bit confusing. `commitSession`itself does not directly add session to storage, it's the snapshot that does this. I think the comment should mention that this function will add the session to the zkDB, this also maps to the `trackSession` comment (which does not modify zkDB.). It would be great if a reader can instantly get what `commit` means here by just reading the comment.


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205630231
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java ---
    @@ -280,6 +275,11 @@ public synchronized boolean addSession(long id, int sessionTimeout) {
             return added;
         }
     
    +    public synchronized boolean commitSession(long id, int sessionTimeout) {
    +        sessionsWithTimeout.put(id, sessionTimeout);
    +        return true;
    --- End diff --
    
    Should we make this `void commitSession` given this always returns true?


---

[GitHub] zookeeper pull request #447: [ZOOKEEPER-2926] Fix potential data consistency...

Posted by lvfangmin <gi...@git.apache.org>.
Github user lvfangmin commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/447#discussion_r205921331
  
    --- Diff: src/java/main/org/apache/zookeeper/server/SessionTracker.java ---
    @@ -47,21 +47,20 @@
         long createSession(int sessionTimeout);
     
         /**
    -     * Add a global session to those being tracked.
    +     * Track the session expire, not add to ZkDb.
          * @param id sessionId
          * @param to sessionTimeout
          * @return whether the session was newly added (if false, already existed)
          */
    -    boolean addGlobalSession(long id, int to);
    +    boolean trackSession(long id, int to);
    --- End diff --
    
    Local session tracker will start track session when create the session.


---