You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by bitgaoshu <gi...@git.apache.org> on 2017/08/14 07:19:39 UTC

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836 fix SocketTimeoutException

GitHub user bitgaoshu opened a pull request:

    https://github.com/apache/zookeeper/pull/336

    ZOOKEEPER-2836 fix SocketTimeoutException

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bitgaoshu/zookeeper ZOOKEEPER-2836

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/336.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #336
    
----
commit 3653e6ddc21589355fb06c98aa60665fce4a4e24
Author: bitgaoshu <bi...@gmail.com>
Date:   2017-08-14T07:02:16Z

    ZOOKEEPER-2836 fix SocketTimeoutException

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper issue #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by skamille <gi...@git.apache.org>.
Github user skamille commented on the issue:

    https://github.com/apache/zookeeper/pull/336
  
    @bitgaoshu if you could make this patch apply to the 3.5 branch I will commit it there as well. Unfortunately that branch has a significant divergence from master and I suspect we want this fix in before we release 3.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by bitgaoshu <gi...@git.apache.org>.
Github user bitgaoshu commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r133886561
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -647,11 +648,10 @@ public void run() {
                             numRetries = 0;
                         }
                     } catch (IOException e) {
    -                    if (shutdown) {
    --- End diff --
    
    Update. I used to consider that the `closeSocket(client); ` should also be executed when exception was thrown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by bitgaoshu <gi...@git.apache.org>.
Github user bitgaoshu commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r134085352
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -638,13 +639,22 @@ public void run() {
                         LOG.info("My election bind port: " + addr.toString());
                         setName(addr.toString());
                         ss.bind(addr);
    +                    ss.setSoTimeout(10 * 1000); // Ten seconds
    --- End diff --
    
    i also think the timeout not need to be set. i just follow [this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354)  simply.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by skamille <gi...@git.apache.org>.
Github user skamille commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r133989279
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -638,13 +639,22 @@ public void run() {
                         LOG.info("My election bind port: " + addr.toString());
                         setName(addr.toString());
                         ss.bind(addr);
    +                    ss.setSoTimeout(10 * 1000); // Ten seconds
    --- End diff --
    
    By setting the timeout to 10s the timeout exception will fire potentially every 10s, is that what we want below?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by bitgaoshu <gi...@git.apache.org>.
Github user bitgaoshu commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r134085348
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -638,13 +639,22 @@ public void run() {
                         LOG.info("My election bind port: " + addr.toString());
                         setName(addr.toString());
                         ss.bind(addr);
    +                    ss.setSoTimeout(10 * 1000); // Ten seconds
    +                    long acceptStartTime = System.currentTimeMillis();
                         while (!shutdown) {
    -                        client = ss.accept();
    -                        setSockOpts(client);
    -                        LOG.info("Received connection request "
    -                                + client.getRemoteSocketAddress());
    -                        receiveConnection(client);
    -                        numRetries = 0;
    +                        try {
    +                            client = ss.accept();
    +                            setSockOpts(client);
    +                            LOG.info("Received connection request "
    +                                     + client.getRemoteSocketAddress());
    +                            receiveConnection(client);
    +                            numRetries = 0;
    +                        } catch (SocketTimeoutException e) {
    +                            LOG.warn("The socket is listening for the election accepted "
    +                                     + "an unexpected timeout ["
    +                                     + (System.currentTimeMillis() - acceptStartTime) + "]milliseconds"
    +                                     + "after the call to accept(). is this an instance of bug ZOOKEEPER-2836?");
    --- End diff --
    
    I agree.  I will leave the timeout at 0 by default and leave a log statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zookeeper/pull/336


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by maoling <gi...@git.apache.org>.
Github user maoling commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r133865685
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -647,11 +648,10 @@ public void run() {
                             numRetries = 0;
                         }
                     } catch (IOException e) {
    -                    if (shutdown) {
    -                        break;
    -                    }
                         LOG.error("Exception while listening", e);
    -                    numRetries++;
    +                    if (!(e instanceof SocketTimeoutException)) {
    --- End diff --
    
    -  can we reproduce this issue?(haha,49days)? This should never happen theoretically.According to [KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or [tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they also  didn't find the root-cause,just do like [this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354) to add some protections against this issue here.
    -  One assumption is SocketServer.accept() use the default infinite value(2 ^ 32 -1=4294967295) without no timeout specified or setSoTimeout(0) 
        > a call to accept() for this ServerSocket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the ServerSocket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout.
    
       so this issuse always happended after 49days 17 hours(4294967295/1000/60/60/24=49.7days)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper issue #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by skamille <gi...@git.apache.org>.
Github user skamille commented on the issue:

    https://github.com/apache/zookeeper/pull/336
  
    +1 merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper issue #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/336
  
    @bitgaoshu What's your username on JIRA (https://issues.apache.org/jira/projects/ZOOKEEPER)? Please create an account on JIRA if you haven't and let me know the user name. Since you contributed the patch, I should assign the JIRA issue to you so you get the credits on your contribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper issue #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by bitgaoshu <gi...@git.apache.org>.
Github user bitgaoshu commented on the issue:

    https://github.com/apache/zookeeper/pull/336
  
    gaoshu, thank you very much. @hanm 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by skamille <gi...@git.apache.org>.
Github user skamille commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r133994057
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -638,13 +639,22 @@ public void run() {
                         LOG.info("My election bind port: " + addr.toString());
                         setName(addr.toString());
                         ss.bind(addr);
    +                    ss.setSoTimeout(10 * 1000); // Ten seconds
    +                    long acceptStartTime = System.currentTimeMillis();
                         while (!shutdown) {
    -                        client = ss.accept();
    -                        setSockOpts(client);
    -                        LOG.info("Received connection request "
    -                                + client.getRemoteSocketAddress());
    -                        receiveConnection(client);
    -                        numRetries = 0;
    +                        try {
    +                            client = ss.accept();
    +                            setSockOpts(client);
    +                            LOG.info("Received connection request "
    +                                     + client.getRemoteSocketAddress());
    +                            receiveConnection(client);
    +                            numRetries = 0;
    +                        } catch (SocketTimeoutException e) {
    +                            LOG.warn("The socket is listening for the election accepted "
    +                                     + "an unexpected timeout ["
    +                                     + (System.currentTimeMillis() - acceptStartTime) + "]milliseconds"
    +                                     + "after the call to accept(). is this an instance of bug ZOOKEEPER-2836?");
    --- End diff --
    
    I don't love this error message and it doesn't make sense because we've set the socket timeout above so it will never time out based on that weird possible JVM error. So either we just continue after a timeout, or leave the timeout at 0 and leave a log statement that indicates it timed out unexpectedly but will retry, but the double fix doesn't really make sense to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by maoling <gi...@git.apache.org>.
Github user maoling commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r133864927
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -647,11 +648,10 @@ public void run() {
                             numRetries = 0;
                         }
                     } catch (IOException e) {
    -                    if (shutdown) {
    --- End diff --
    
    why we need to move code block **Line650-Line652** to code block **Line665-Line667** ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by bitgaoshu <gi...@git.apache.org>.
Github user bitgaoshu commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/336#discussion_r133885327
  
    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -647,11 +648,10 @@ public void run() {
                             numRetries = 0;
                         }
                     } catch (IOException e) {
    -                    if (shutdown) {
    -                        break;
    -                    }
                         LOG.error("Exception while listening", e);
    -                    numRetries++;
    +                    if (!(e instanceof SocketTimeoutException)) {
    --- End diff --
    
    - update
    
    - l checked the native method `java.net.PlainSocketImpl.socketAccept(Native Method)` in [openjdk](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/9d617cfd6717/src/solaris/native/java/net/PlainSocketImpl.c), **line709-721**, in which it changed from 0 to -1. and then timeout of -1 is interpreted as an infinite timeout.  In some cases, [-1 was interpreted as a larger positive integer](https://lwn.net/Articles/483078/). so this issue always happend after 49days. It's my wild conjecture.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper issue #336: ZOOKEEPER-2836: fix SocketTimeoutException

Posted by bitgaoshu <gi...@git.apache.org>.
Github user bitgaoshu commented on the issue:

    https://github.com/apache/zookeeper/pull/336
  
    ok, i had made this patch apply to the 3.5. [PR](https://github.com/apache/zookeeper/pull/344)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---