You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by mkedwards <gi...@git.apache.org> on 2018/11/20 13:35:09 UTC

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

GitHub user mkedwards opened a pull request:

    https://github.com/apache/zookeeper/pull/707

    [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

    [ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mkedwards/zookeeper branch-3.5

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #707
    
----
commit 3694a4e31eef9b85de59112c22ab163452610743
Author: Michael Edwards <me...@...>
Date:   2018-11-20T13:33:09Z

    [ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference

----


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards closed the pull request at:

    https://github.com/apache/zookeeper/pull/707


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235463520
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -755,39 +769,55 @@ public void recreateSocketAddresses(long id) {
             }
         }
     
    -    public InetSocketAddress getQuorumAddress(){
    -        synchronized (QV_LOCK) {
    -            return myQuorumAddr;
    +    InetSocketAddress getQuorumAddress(){
    --- End diff --
    
    Good suggestion.  Done.


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235467449
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -1556,32 +1573,50 @@ public String getNextDynamicConfigFilename() {
         }
         
         public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
    -        synchronized (QV_LOCK) {
    -            if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    -                LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    -                        ". Current version: " + quorumVerifier.getVersion());
    +        // If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
    +        // and then take QV_LOCK.  Take the locks in the same order to ensure that we don't
    +        // deadlock against other callers of connectOne().  If qcmRef gets set in another
    +        // thread while we're inside the synchronized block, that does no harm; if we didn't
    +        // take a lock on qcm (because it was null when we sampled it), we won't call
    +        // connectOne() on it.  (Use of an AtomicReference is enough to guarantee visibility
    +        // of updates that provably happen in another thread before entering this method.)
    +        QuorumCnxManager qcm = qcmRef.get();
    +        Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
    --- End diff --
    
    It does.  I'm fairly sure that this guarantees that the `QuorumCnxManager` lock can only be taken inside `QV_LOCK` if it's already held outside it.  Perhaps we can get confirmation from @castuardo and his team's Distributed System Model Checking tool?


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    Thanks maoling; I'm familiar with the procedure, just hadn't gotten around to juggling branches.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    > `private` should be changed back to `public `
    > Jenkins complains about [this](https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2685/artifact/patchprocess/trunkJavacWarnings.txt)
    
    Okay.  It might be better for the tests to be in the same package as the implementation, so these methods can be package-private; but that's out of scope for this PR.  I'll change them back to `public`.


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235416627
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -755,39 +769,55 @@ public void recreateSocketAddresses(long id) {
             }
         }
     
    -    public InetSocketAddress getQuorumAddress(){
    -        synchronized (QV_LOCK) {
    -            return myQuorumAddr;
    +    InetSocketAddress getQuorumAddress(){
    --- End diff --
    
    You might want to put the double-check-locking mechanism into a private getter of `myAddrs` to consolidate the logic in these 2 getters.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by maoling <gi...@git.apache.org>.
Github user maoling commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    `private` should be changed back to `public `
    Jenkins complains about [this](https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2685/artifact/patchprocess/trunkJavacWarnings.txt) 


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235807260
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -121,6 +126,18 @@
          */
         private ZKDatabase zkDb;
     
    +    public static class AddressTuple {
    --- End diff --
    
    Good idea.  Done.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    The `LeaderBeanTest` failure was real-ish; the mock `QuorumVerifier` wasn't set up properly for address resolution, so it would NPE (without the new condvar-style wait for addresses to be set) or hang (with it).  I just pushed a fix to the test.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    The deadlock, as I understand it, comes from an attempt to take `QV_LOCK` in order to access `myElectionAddress` in thread A, while holding the lock on the `QuorumCnxManager` object; meanwhile, thread B, which holds the `QV_LOCK` while `connectOne()` is running, tries to take the lock on the `QuorumCnxManager` object, producing a deadly embrace.  From the Jira:
    
    ```
        [junit]  java.lang.Thread.State: BLOCKED
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
        [junit] 
    
    
        [junit]  java.lang.Thread.State: BLOCKED
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
        [junit]         at  org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
        [junit]         at  org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
        [junit]         at  org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
    ```
    
    This patch removes the synchronization on `QV_LOCK` from the address accessor paths, where its current effect seems primarily to ensure cross-thread visibility; the memory barriers involved in setting/fetching through an `AtomicReference` are equally effective at this.  Wrapping a single `AtomicReference<AddressTuple>` around all three addresses ensures that you don't get interleaved sets when two threads each try to update the address fields -- although in the code as I read it, callers are holding the `QV_LOCK` anyway.  I can't yet say that I've analyzed all other paths that take `QV_LOCK`, but I'm fairly sure that if https://github.com/apache/zookeeper/pull/247 would work, this would work at least as well, while at least partially addressing concerns about getting "outdated" state in read paths.  (If this is a serious concern, then presumably we need a separate reader/writer lock that is taken during creation of the QuorumPeer and not released until it reaches a "fully configu
 red" state for the first time.)


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    @mkedwards Great patch! One of the outstanding blockers for the 3.5 release. Thanks for the explanation, I'll review it as soon as possible.


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by eolivelli <gi...@git.apache.org>.
Github user eolivelli commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235713462
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -108,7 +109,11 @@
         LocalPeerBean jmxLocalPeerBean;
         private Map<Long, RemotePeerBean> jmxRemotePeerBean;
         LeaderElectionBean jmxLeaderElectionBean;
    -    private QuorumCnxManager qcm;
    +
    +    // The QuorumCnxManager is held through an AtomicReference to ensure cross-thread visibility
    +    // of updates; see the implementation comment at setLastSeenQuorumVerifier().
    +    private AtomicReference<QuorumCnxManager> qcmRef = new AtomicReference<>();
    --- End diff --
    
    If we are not using Compare and set, why a volatile is not enough?


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    Note that this patch is targeted at the 3.5 release branch, which I'm trying to help whip into shape for testing in our environment (where 3.4.13 is already in production use).  What else can I do to help https://builds.apache.org/job/ZooKeeper_branch35_jdk8/ get to green?


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    After groveling through https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/consoleText, I think this may be a contributing factor to flaky tests:
    ```
         [exec]     [junit] 2018-11-22 00:18:30,336 [myid:] - INFO  [QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind port: localhost/127.0.0.1:19459
         [exec]     [junit] 2018-11-22 00:18:30,337 [myid:] - INFO  [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@493] - binding to port localhost/127.0.0.1:19466
         [exec]     [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@497] - Error while reconfiguring
         [exec]     [junit] org.jboss.netty.channel.ChannelException: Failed to bind to: localhost/127.0.0.1:19466
         [exec]     [junit] 	at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
         [exec]     [junit] 	at org.apache.zookeeper.server.NettyServerCnxnFactory.reconfigure(NettyServerCnxnFactory.java:494)
         [exec]     [junit] 	at org.apache.zookeeper.server.quorum.QuorumPeer.processReconfig(QuorumPeer.java:1947)
         [exec]     [junit] 	at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:154)
         [exec]     [junit] 	at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
         [exec]     [junit] 	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1263)
         [exec]     [junit] Caused by: java.net.BindException: Address already in use
         [exec]     [junit] 	at sun.nio.ch.Net.bind0(Native Method)
         [exec]     [junit] 	at sun.nio.ch.Net.bind(Net.java:433)
         [exec]     [junit] 	at sun.nio.ch.Net.bind(Net.java:425)
         [exec]     [junit] 	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
         [exec]     [junit] 	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
         [exec]     [junit] 	at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
         [exec]     [junit] 	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
         [exec]     [junit] 	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
         [exec]     [junit] 	at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
         [exec]     [junit] 	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
         [exec]     [junit] 	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
         [exec]     [junit] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         [exec]     [junit] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         [exec]     [junit] 	at java.lang.Thread.run(Thread.java:748)
    ```


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    Please close this PR.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    That makes perfect sense.  Thanks for keeping an eye on this.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    @afine , does this approach address your concerns with #247?


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    @eolivelli , have I addressed all your concerns?  Does this look to you like a complete fix to ZOOKEEPER-2778?


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by eolivelli <gi...@git.apache.org>.
Github user eolivelli commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235712847
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -121,6 +126,18 @@
          */
         private ZKDatabase zkDb;
     
    +    public static class AddressTuple {
    --- End diff --
    
    nit: final?


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    This PR now has the extraneous changes removed and is green in CI.  Please re-review at your convenience.


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235460740
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -1556,32 +1573,50 @@ public String getNextDynamicConfigFilename() {
         }
         
         public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
    -        synchronized (QV_LOCK) {
    -            if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    -                LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    -                        ". Current version: " + quorumVerifier.getVersion());
    +        // If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
    +        // and then take QV_LOCK.  Take the locks in the same order to ensure that we don't
    +        // deadlock against other callers of connectOne().  If qcmRef gets set in another
    +        // thread while we're inside the synchronized block, that does no harm; if we didn't
    +        // take a lock on qcm (because it was null when we sampled it), we won't call
    +        // connectOne() on it.  (Use of an AtomicReference is enough to guarantee visibility
    +        // of updates that provably happen in another thread before entering this method.)
    +        QuorumCnxManager qcm = qcmRef.get();
    +        Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
    --- End diff --
    
    Yes; I'd like to get confirmation from @castuardo and his team that this fully solves the lock nesting problem.  But if you look at the flow of the code, I think you'll see that the only way _this_ code can take the lock on a {{QuorumCnxManager}} is via {{qcm.connectOne()}}.  Since Java locks are reentrant, we're safe ensuring that the nesting is always qcm -> QV_LOCK -> qcm -> QV_LOCK if qcm is non-null and QV_LOCK -> QV_LOCK if qcm is null.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    Committed to branch-3.5. Thanks @mkedwards !


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    I have pushed an incremental change which should address the lock inversion scenario more completely, along with condition-variable-style waiting for the quorum/election address to become non-null.  (This proved a great deal simpler than trying to use a ReentrantReadWriteLock to solve that problem.)  Please re-review.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    Review comments addressed.  It's a little bit weird that `ant test` doesn't recompile the tests when the implementation has changed; an `ant clean`/`ant test` cycle caught another test failure-to-compile due to more restrictive access on data members.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    There appear to be test failures in LeaderBeanTest:
    ```
         [exec]     [junit] Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 0.619 sec, Thread: 8, Class: org.apache.zookeeper.server.quorum.LeaderBeanTest
         [exec]     [junit] Test org.apache.zookeeper.server.quorum.LeaderBeanTest FAILED
    ```
    Can anyone help me read the tea leaves?


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    Let's see what @eolivelli saying, but I'd like to commit the other PR to master first and this one only if the other one fails to apply to 3.5.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    Whoops, just pushed some unrelated stuff to this branch.  The code state that fixes just ZOOKEEPER-2778 is https://github.com/apache/zookeeper/pull/707/commits/bbeeebf87391ef642059c4b3b65592c361a2ab4e


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235808196
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -108,7 +109,11 @@
         LocalPeerBean jmxLocalPeerBean;
         private Map<Long, RemotePeerBean> jmxRemotePeerBean;
         LeaderElectionBean jmxLeaderElectionBean;
    -    private QuorumCnxManager qcm;
    +
    +    // The QuorumCnxManager is held through an AtomicReference to ensure cross-thread visibility
    +    // of updates; see the implementation comment at setLastSeenQuorumVerifier().
    +    private AtomicReference<QuorumCnxManager> qcmRef = new AtomicReference<>();
    --- End diff --
    
    I am hoping to reduce the need for synchronized blocks in follow-up changes.  For now, I added a simple use of getAndSet() to detect multiple calls to createElectionAlgorithm() and ensure that the QCM that's being dropped on the floor gets halted first.


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by maoling <gi...@git.apache.org>.
Github user maoling commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235357356
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -1556,32 +1573,50 @@ public String getNextDynamicConfigFilename() {
         }
         
         public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
    -        synchronized (QV_LOCK) {
    -            if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    -                LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    -                        ". Current version: " + quorumVerifier.getVersion());
    +        // If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
    +        // and then take QV_LOCK.  Take the locks in the same order to ensure that we don't
    +        // deadlock against other callers of connectOne().  If qcmRef gets set in another
    +        // thread while we're inside the synchronized block, that does no harm; if we didn't
    +        // take a lock on qcm (because it was null when we sampled it), we won't call
    +        // connectOne() on it.  (Use of an AtomicReference is enough to guarantee visibility
    +        // of updates that provably happen in another thread before entering this method.)
    +        QuorumCnxManager qcm = qcmRef.get();
    +        Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
    --- End diff --
    
    nest `synchronized` merits careful consideration?


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235414227
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -1566,32 +1585,50 @@ public String getNextDynamicConfigFilename() {
         }
         
         public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
    -        synchronized (QV_LOCK) {
    -            if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    -                LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    -                        ". Current version: " + quorumVerifier.getVersion());
    +        // If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
    +        // and then take QV_LOCK.  Take the locks in the same order to ensure that we don't
    +        // deadlock against other callers of connectOne().  If qcmRef gets set in another
    +        // thread while we're inside the synchronized block, that does no harm; if we didn't
    +        // take a lock on qcm (because it was null when we sampled it), we won't call
    +        // connectOne() on it.  (Use of an AtomicReference is enough to guarantee visibility
    +        // of updates that provably happen in another thread before entering this method.)
    +        QuorumCnxManager qcm = qcmRef.get();
    +        Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
    +        synchronized (outerLockObject) {
    +            synchronized (QV_LOCK) {
    +                if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    +                    LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    +                            ". Current version: " + quorumVerifier.getVersion());
    +                }
    +                // assuming that a version uniquely identifies a configuration, so if
    +                // version is the same, nothing to do here.
    +                if (lastSeenQuorumVerifier != null &&
    +                        lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
    +                    return;
    +                }
    +                lastSeenQuorumVerifier = qv;
     
    -            }
    -            // assuming that a version uniquely identifies a configuration, so if
    -            // version is the same, nothing to do here.
    -            if (lastSeenQuorumVerifier != null &&
    -                    lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
    -                return;
    -            }
    -            lastSeenQuorumVerifier = qv;
    -            connectNewPeers();
    --- End diff --
    
    Why have you refactored this? I think extracting the logic into private method makes the code more readable.


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235470421
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -1566,32 +1585,50 @@ public String getNextDynamicConfigFilename() {
         }
         
         public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
    -        synchronized (QV_LOCK) {
    -            if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    -                LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    -                        ". Current version: " + quorumVerifier.getVersion());
    +        // If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
    +        // and then take QV_LOCK.  Take the locks in the same order to ensure that we don't
    +        // deadlock against other callers of connectOne().  If qcmRef gets set in another
    +        // thread while we're inside the synchronized block, that does no harm; if we didn't
    +        // take a lock on qcm (because it was null when we sampled it), we won't call
    +        // connectOne() on it.  (Use of an AtomicReference is enough to guarantee visibility
    +        // of updates that provably happen in another thread before entering this method.)
    +        QuorumCnxManager qcm = qcmRef.get();
    +        Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
    +        synchronized (outerLockObject) {
    +            synchronized (QV_LOCK) {
    +                if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    +                    LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    +                            ". Current version: " + quorumVerifier.getVersion());
    +                }
    +                // assuming that a version uniquely identifies a configuration, so if
    +                // version is the same, nothing to do here.
    +                if (lastSeenQuorumVerifier != null &&
    +                        lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
    +                    return;
    +                }
    +                lastSeenQuorumVerifier = qv;
     
    -            }
    -            // assuming that a version uniquely identifies a configuration, so if
    -            // version is the same, nothing to do here.
    -            if (lastSeenQuorumVerifier != null &&
    -                    lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
    -                return;
    -            }
    -            lastSeenQuorumVerifier = qv;
    -            connectNewPeers();
    --- End diff --
    
    I had folded it back in so I could see the whole flow with the nested calls to `qcm.connectOne()`.  I'll factor it back out with `qcm` as a method parameter (it's important that the `AtomicReference` `qcmRef` not be sampled again inside the `synchronized` block, since we could race against another thread changing the election algorithm).


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235463077
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -1566,32 +1585,50 @@ public String getNextDynamicConfigFilename() {
         }
         
         public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
    -        synchronized (QV_LOCK) {
    -            if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    -                LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    -                        ". Current version: " + quorumVerifier.getVersion());
    +        // If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
    +        // and then take QV_LOCK.  Take the locks in the same order to ensure that we don't
    +        // deadlock against other callers of connectOne().  If qcmRef gets set in another
    +        // thread while we're inside the synchronized block, that does no harm; if we didn't
    +        // take a lock on qcm (because it was null when we sampled it), we won't call
    +        // connectOne() on it.  (Use of an AtomicReference is enough to guarantee visibility
    +        // of updates that provably happen in another thread before entering this method.)
    +        QuorumCnxManager qcm = qcmRef.get();
    +        Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
    +        synchronized (outerLockObject) {
    +            synchronized (QV_LOCK) {
    +                if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    +                    LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    +                            ". Current version: " + quorumVerifier.getVersion());
    +                }
    +                // assuming that a version uniquely identifies a configuration, so if
    +                // version is the same, nothing to do here.
    +                if (lastSeenQuorumVerifier != null &&
    +                        lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
    +                    return;
    +                }
    +                lastSeenQuorumVerifier = qv;
     
    -            }
    -            // assuming that a version uniquely identifies a configuration, so if
    -            // version is the same, nothing to do here.
    -            if (lastSeenQuorumVerifier != null &&
    -                    lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
    -                return;
    -            }
    -            lastSeenQuorumVerifier = qv;
    -            connectNewPeers();
    --- End diff --
    
    Got it. Please factor it back out with a parameter. Thanks!


---

[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

Posted by mkedwards <gi...@git.apache.org>.
Github user mkedwards commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/707#discussion_r235458260
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java ---
    @@ -1566,32 +1585,50 @@ public String getNextDynamicConfigFilename() {
         }
         
         public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
    -        synchronized (QV_LOCK) {
    -            if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    -                LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    -                        ". Current version: " + quorumVerifier.getVersion());
    +        // If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
    +        // and then take QV_LOCK.  Take the locks in the same order to ensure that we don't
    +        // deadlock against other callers of connectOne().  If qcmRef gets set in another
    +        // thread while we're inside the synchronized block, that does no harm; if we didn't
    +        // take a lock on qcm (because it was null when we sampled it), we won't call
    +        // connectOne() on it.  (Use of an AtomicReference is enough to guarantee visibility
    +        // of updates that provably happen in another thread before entering this method.)
    +        QuorumCnxManager qcm = qcmRef.get();
    +        Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
    +        synchronized (outerLockObject) {
    +            synchronized (QV_LOCK) {
    +                if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
    +                    LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
    +                            ". Current version: " + quorumVerifier.getVersion());
    +                }
    +                // assuming that a version uniquely identifies a configuration, so if
    +                // version is the same, nothing to do here.
    +                if (lastSeenQuorumVerifier != null &&
    +                        lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
    +                    return;
    +                }
    +                lastSeenQuorumVerifier = qv;
     
    -            }
    -            // assuming that a version uniquely identifies a configuration, so if
    -            // version is the same, nothing to do here.
    -            if (lastSeenQuorumVerifier != null &&
    -                    lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
    -                return;
    -            }
    -            lastSeenQuorumVerifier = qv;
    -            connectNewPeers();
    --- End diff --
    
    The {{qcm}} that it references is now a local variable rather than an instance data member.  If you'd like, I could factor it back out, with {{qcm}} as an argument to the function.


---

[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

Posted by maoling <gi...@git.apache.org>.
Github user maoling commented on the issue:

    https://github.com/apache/zookeeper/pull/707
  
    @mkedwards 
    make sure the code in your origin branch-3.5 is what you want for `ZOOKEEPER-2778 `
    then `git push origin branch-3.5 -f` will be ok


---