You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by anmolnar <gi...@git.apache.org> on 2018/11/12 20:24:05 UTC

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 Some test cases are failing beca...

GitHub user anmolnar opened a pull request:

    https://github.com/apache/zookeeper/pull/700

    ZOOKEEPER-1441 Some test cases are failing because Port bind issue.

    Fixes the Java 11 build issue. Details are in Jira.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/anmolnar/zookeeper ZOOKEEPER-1441

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/700.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #700
    
----
commit 93dde673fa8c73ce524199384b6cb9c73b5ad324
Author: Andor Molnar <an...@...>
Date:   2018-11-12T20:19:24Z

    ZOOKEEPER-1441. Don't open selector until ServerCnxnFactory is started.

----


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r234871213
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    If we are binding a port that's already in use here then I think we'll have a problem because the expectation here is the acceptor thread will always be available unless explicitly being shut down by caller (for this reason we caught but ignored all checked and unchecked exception.). The problem is the control flow does not reach higher level from within this acceptor thread - thus if we have an instantiated but stopped `NIOServerCnxnFactory` due to acceptor thread exceptions, the entire zk process could end up in a weird state. Previous code does not have this issue because it tries to bind port early and complain if it can't such that caller would catch the issue earlier before the acceptor threads started. This should be easily testable if we spin up a ZK server with an unavailable port with this fix and see what happens.


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by eolivelli <gi...@git.apache.org>.
Github user eolivelli commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r234624742
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    Sorry for delay. I did not have time to find a good solution, I don't want to block this change


---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    retest this please


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by eolivelli <gi...@git.apache.org>.
Github user eolivelli commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r232832059
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    Should we abort the server?


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by eolivelli <gi...@git.apache.org>.
Github user eolivelli commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r233723629
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -181,7 +181,7 @@ protected void fastCloseSock(SocketChannel sc) {
          */
         private class AcceptThread extends AbstractSelectThread {
             private final ServerSocketChannel acceptSocket;
    -        private final SelectionKey acceptKey;
    +        private SelectionKey acceptKey;
    --- End diff --
    
    @anmolnar sorry for late reply.
    I think it will be tricky. I will spend some time and try to find a suggestion


---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    >> This will keep the original behaviour of eagerly registering the selector in the constructor and addresses your concerns.
    
    LGTM, thanks for doing this!
    
    >> ReconfigTest.testPortChangeToBlockedPort failed, becuase it expects the original port to be released even if the new port cannot be bound. 
    
    The proposed change looks reasonable. Though I am wondering why the current logic would fail this test: if new port fails to bind, the old port will be closed as part of `tryClose(oldSS);` in the `catch` block. Mind to elaborate a little bit regarding which cases the current code fails to close the old socket?
    
    



---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    @anmolnar So if I understand this correctly, this test `ReconfigTest.testPortChangeToBlockedPort` will consistently failing under Java 11, because `this.ss = ServerSocketChannel.open();` will prevent the socket close in first `tryClose(oldSS)`, which is why you moved the code block of `this.ss = ServerSocketChannel.open();` after `tryClose(oldSS)`, right? 



---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    retest this please


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by eolivelli <gi...@git.apache.org>.
Github user eolivelli commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r232832500
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -181,7 +181,7 @@ protected void fastCloseSock(SocketChannel sc) {
          */
         private class AcceptThread extends AbstractSelectThread {
             private final ServerSocketChannel acceptSocket;
    -        private final SelectionKey acceptKey;
    +        private SelectionKey acceptKey;
    --- End diff --
    
    Now this variable is nullable.
    Shouldn't we add some null check at every access?


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r232840623
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    Makes sense. Any suggestion how to make it properly?


---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    @anmolnar  gotcha. +1, will commit this when i have a chance.


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r235218313
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    Yeah, please ignore the port binding part of my statement previously. I was actually referring to port selection. Before this fix, we have selector registration in constructor of `AcceptorThread`:
    `this.acceptKey =
                    acceptSocket.register(selector, SelectionKey.OP_ACCEPT);`
    If we get an exception here, we will bail out and zk server will not start.
    
    If we selector registration inside AcceptorThread and got an exception, then the acceptor thread will shutdown, but the caller will not be aware of this so the zk server could be in a weird state. That's my only concern of this patch. Does this sound a reasonable concern?


---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    @hanm The issue with Java 11 builds is the refactoring that was made in NIO from Java 11: 
    https://www.oracle.com/technetwork/java/javase/11-relnote-issues-5012449.html#JDK-8198562
    The key part is:
    > Closing a connected SocketChannel that is registered with a Selector will now consistently delay closing the underlying connection until the closed channel is flushed from all selectors that it is registered with. 
    
    In other words, we have to close all registered Selectors to properly close the socket. Therefore `tryClose(oldSS)` on its own is not enough.


---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    @hanm Not exactly.
    The test case is about reconfig the server to a port which is unavailable (blocked). Steps:
    1) Bind the server to "oldClientPort"
    2) Bind "newClientPort" with custom socket
    3) Reconfig the server to bind "newClientPort" - fails
    4) Check if "oldClientPort" has become available
    ...
    Step 4) fails, because of the aforementioned reason: reconfig fails to bind the port, throws exception, but exception handler doesn't close the selector, only the socket.
    I do things in swapped order: close the thread and socket first, then try to bind the new port.


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r235013286
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    3.4 doesn't have this issue, because the selector is explicitly closed in the shutdown method. In 3.5 and master only the owning thread (accept) is able to close it, but it won't, if it doesn't even started.
    
    I'll do the test that you suggested, however I'm not sure how port binding is related to selector registration. 


---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    @eolivelli @hanm Now, an entirely different approach. Solving 2 issues at the same time:
    1) Close selector in shutdown() when Accept and Select thread haven't been started yet. This will keep the original behaviour of eagerly register the selector in the constructor and addresses your concerns.
    2) Turned out that we have a similar issue in reconfig: There was an issue with `ReconfigTest.testPortChangeToBlockedPort`: it expects the original port to be released even if the new port cannot be bound. Sounds like a reasonable expectation and the exception handler in the `reconfigure()` method confirms this. Hence I fixed the code, not the test.


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zookeeper/pull/700


---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    Being the bad guy again: committed my own patch to master and 3.5 branches. Thanks @hanm and @eolivelli !


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r234871411
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    I am also curious why 3.4 does not have this test issue. Technically 3.4 works similar with master and 3.5 w.r.t. the selector registration - both register selector before the server factory is 'started' (though in 3.5 and trunk, we made `NIOServerCnxn` multi-threaded`).


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by eolivelli <gi...@git.apache.org>.
Github user eolivelli commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r233723763
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    @anmolnar sorry for late reply.
    I think it will be tricky. I will spend some time and try to find a suggestion


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r235016741
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -209,6 +209,8 @@ public void run() {
                             LOG.warn("Ignoring unexpected exception", e);
                         }
                     }
    +            } catch (IOException e) {
    +                LOG.error("Exception when running accept thread. Unable to register selector?", e);
    --- End diff --
    
    ```
    2018-11-20 15:12:15,758 [myid:] - INFO  [main:NIOServerCnxnFactory@685] - binding to port 0.0.0.0/0.0.0.0:2181
    2018-11-20 15:12:15,760 [myid:] - ERROR [main:ZooKeeperServerMain@87] - Unexpected exception, exiting abnormally
    java.net.BindException: Address already in use
    	at sun.nio.ch.Net.bind0(Native Method)
    	at sun.nio.ch.Net.bind(Net.java:433)
    	at sun.nio.ch.Net.bind(Net.java:425)
    	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
    	at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:686)
    	at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:157)
    	at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:110)
    	at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:68)
    	at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:133)
    	at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:87)
    ```


---

[GitHub] zookeeper pull request #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are ...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/700#discussion_r232840801
  
    --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java ---
    @@ -181,7 +181,7 @@ protected void fastCloseSock(SocketChannel sc) {
          */
         private class AcceptThread extends AbstractSelectThread {
             private final ServerSocketChannel acceptSocket;
    -        private final SelectionKey acceptKey;
    +        private SelectionKey acceptKey;
    --- End diff --
    
    You're right. However the only access in this class is called from the `select()` method which is called only after the registration is successful, so I think we're fine.


---

[GitHub] zookeeper issue #700: ZOOKEEPER-1441 - JAVA 11 - Some test cases are failing...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/700
  
    @hanm @lvfangmin @phunt 
    Please review this, looks like a quite easy win to get Java11 builds green. Thanks!


---