You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2011/09/08 20:48:09 UTC

[jira] [Created] (ZOOKEEPER-1174) FD leak when network unreachable

FD leak when network unreachable
--------------------------------

                 Key: ZOOKEEPER-1174
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
             Project: ZooKeeper
          Issue Type: Bug
          Components: java client
    Affects Versions: 3.3.3
            Reporter: Ted Dunning
            Assignee: Ted Dunning
            Priority: Critical
             Fix For: 3.3.4


In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.

I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100621#comment-13100621 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

Ted, don't we still need to register the sockKey even if sock.connect returns true? 

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100800#comment-13100800 ] 

Hadoop QA commented on ZOOKEEPER-1174:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12493692/ZOOKEEPER-1174.patch
  against trunk revision 1165443.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/516//console

This message is automatically generated.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118482#comment-13118482 ] 

Hadoop QA commented on ZOOKEEPER-1174:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12497223/ZOOKEEPER-1174-3.3.patch
  against trunk revision 1177432.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/600//console

This message is automatically generated.
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174-3.3.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108044#comment-13108044 ] 

Patrick Hunt commented on ZOOKEEPER-1174:
-----------------------------------------

This jira has this set currently:

bq. Fix Version/s: 3.3.4, 3.4.0, 3.5.0

So a patch, or patches, that would be applied to branch-3.3, branch-3.4, and trunk.


> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115512#comment-13115512 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

Also this should apply to 3.4 which hopefully doesn't have the change from 786 in it, so I will put it in there. Did we also want to put it in 3.3.4? Are those two releases going out at the same time?
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108010#comment-13108010 ] 

Ted Dunning commented on ZOOKEEPER-1174:
----------------------------------------

OK.

Can you say specifically which branches you mean?

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated ZOOKEEPER-1174:
-----------------------------------

    Attachment: ZOOKEEPER-1174.patch

Here is a cheesy test.  The idea is that I injected an explicit throw of the same exception that a downed internet connection causes.

Is this just toooo cheesy to stomach?

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117209#comment-13117209 ] 

Hudson commented on ZOOKEEPER-1174:
-----------------------------------

Integrated in ZooKeeper-trunk #1318 (See [https://builds.apache.org/job/ZooKeeper-trunk/1318/])
    ZOOKEEPER-1174. FD leak when network unreachable (Ted Dunning via camille)

camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177042
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java

                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Camille Fournier updated ZOOKEEPER-1174:
----------------------------------------

    Attachment: ZOOKEEPER-1174fix.patch
    
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Patrick Hunt (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141456#comment-13141456 ] 

Patrick Hunt commented on ZOOKEEPER-1174:
-----------------------------------------

np, this reminds me though, we are only testing the c code on windows (CI env), not java, it would be good to add this - Matthias you interested to help? https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-WinVS2008/

If so please start the discussion on the ML. thanks!
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174-3.3.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115173#comment-13115173 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

I was going to put this in, but I don't have a patch that cleanly applies, so it will take some work. I'll look at it tomorrow or Wed, when are you planning on doing the release?
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105831#comment-13105831 ] 

Hadoop QA commented on ZOOKEEPER-1174:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12494741/ZOOKEEPER-1174.patch
  against trunk revision 1170886.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/554//console

This message is automatically generated.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated ZOOKEEPER-1174:
-----------------------------------

    Attachment: ZOOKEEPER-1174.patch

This patch should be ready to commit.  Tests are removed pending another JIRA.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115464#comment-13115464 ] 

Ted Dunning commented on ZOOKEEPER-1174:
----------------------------------------

The old patch fails to apply because of a change introduced in the fix for ZOOKEEPER-786.  The chance is this:

@@ -185,9 +190,7 @@ public class ClientCnxnSocketNIO extends ClientCnxnSocket {
         sock.socket().setSoLinger(false, -1);
         sock.socket().setTcpNoDelay(true);
         sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
-        if (sock.connect(addr)) {
-            sendThread.primeConnection();
-        }
+        sock.connect(addr);
         initialized = false;
 

Why was the primeConnection call deleted?  I can update the patch to account for this but that seems a bit dangerous since the commit comment on this patch doesn't refer to this change at all.

                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated ZOOKEEPER-1174:
-----------------------------------

    Attachment: zk-fd-leak.tgz

Here is a program that demonstrates the problem.  It includes a README and sample output with and without the fix.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated ZOOKEEPER-1174:
-----------------------------------

    Attachment: ZOOKEEPER-1174.patch

Here is an updated patch that maintains the sockKey even for immediate loads.  My guess is that this didn't matter in testing so far because it is rare for an async socket to connect instantly.

This addresses Camille's eagle-eyed comments.

I have added a few javadoc fixes and one weakening of a catch from Exception to Throwable in the general spirit of making things better when I see them.  They are unrelated to this JIRA, but are very minor so do not warrant their own bug report. 

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100642#comment-13100642 ] 

Hadoop QA commented on ZOOKEEPER-1174:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12493648/ZOOKEEPER-1174.patch
  against trunk revision 1165443.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/515//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/515//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/515//console

This message is automatically generated.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Matthias Spycher (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141436#comment-13141436 ] 

Matthias Spycher commented on ZOOKEEPER-1174:
---------------------------------------------

Patrick, thanks for the ref.
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174-3.3.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100635#comment-13100635 ] 

Ted Dunning commented on ZOOKEEPER-1174:
----------------------------------------

Camille,

Thanks for looking at this.  I am not sure if it my assertion is true either, but it does seem correct to me.  (happily, I expressed some doubt)

The documentation for sock.connect is exactly what I base my (current) position on.  The idea is that if connect returns true, then you don't need to use select to wait for the connection and can proceed immediately with the primeConnection and light up the connection for prime time.  It is only if connect returns false that deferred actions are required.


> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115455#comment-13115455 ] 

Ted Dunning commented on ZOOKEEPER-1174:
----------------------------------------

No.  No update beyond that last patch.  Should be ready to roll.

I talked to Camille last week and it sounded like she was on the verge of committing it.
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Patrick Hunt (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141347#comment-13141347 ] 

Patrick Hunt commented on ZOOKEEPER-1174:
-----------------------------------------

Matthias - see ZOOKEEPER-1271, it's the same issues re handling the exception, but on solaris
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174-3.3.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115510#comment-13115510 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

Good catch Ted, I must ask the same question myself.
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115202#comment-13115202 ] 

Mahadev konar commented on ZOOKEEPER-1174:
------------------------------------------

Wed night my time?

                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1174:
------------------------------------

    Fix Version/s: 3.5.0
                   3.4.0

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101642#comment-13101642 ] 

Ted Dunning commented on ZOOKEEPER-1174:
----------------------------------------

Mocking the test sounds great.  Using this bug to bring in a mocking technology that can mock
static methods is a little more ambitious than I wanted it to be.

I see that jmockit and powermock both claim the ability to do this.  Powermock requires another
mocking technology underneath.  Jmockit has the problem that it isn't available in an official
maven repo.

My tendency is to suggest that we commit this without the unit test and open another JIRA to address
the testing problem in general.

If I can get sign-off on that, then I will produce a final patch to verify.  The code right now stands
like this:
{code}
        try {
            boolean immediateConnect = sock.connect(addr);
            sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
            if (immediateConnect) {
                sendThread.primeConnection();
            }
        } catch (IOException e) {
            sock.close();
        }
        initialized = false;
{code}


> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101531#comment-13101531 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

Maybe a little cheesy. Could we do this with mocks? I'm not crazy about having a random "injectSocketError" flag in the code just for testing.

Also, should probably go ahead and log the socket connection error in the same way we do in SendThread.run, so people don't lose logging information.

Also, I think you need to register the SockKey before calling primeConnection, otherwise the call in primeConnection to clientCnxnSocket.enableReadWriteOnly() will fail.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100819#comment-13100819 ] 

Hadoop QA commented on ZOOKEEPER-1174:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12493693/ZOOKEEPER-1174.patch
  against trunk revision 1165443.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/517//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/517//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/517//console

This message is automatically generated.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117096#comment-13117096 ] 

Mahadev konar commented on ZOOKEEPER-1174:
------------------------------------------

Ted/Camille, Any update on this? Looks like this was committed to 3.4 branch, and probably trunk? 
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100583#comment-13100583 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

We'd have to change drastically the try/catch logic to try to connect the socket then register it with the selector. 
Should we just fix this by calling selector.selectNow() in the cleanup method after cancelling the sockKey? I think that might fix the leak.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115159#comment-13115159 ] 

Mahadev konar commented on ZOOKEEPER-1174:
------------------------------------------

Ted,
 Any update on this? Please let me know. I plan to cut a release soon and would like to get this in.

thanks
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116753#comment-13116753 ] 

Hadoop QA commented on ZOOKEEPER-1174:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12496886/ZOOKEEPER-1174fix.patch
  against trunk revision 1176903.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/594//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/594//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/594//console

This message is automatically generated.
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100847#comment-13100847 ] 

Hadoop QA commented on ZOOKEEPER-1174:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12493712/ZOOKEEPER-1174.patch
  against trunk revision 1165443.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/518//console

This message is automatically generated.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated ZOOKEEPER-1174:
-----------------------------------

    Attachment: ZOOKEEPER-1174.patch

Try again with different format to please the checking script.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116036#comment-13116036 ] 

Ted Dunning commented on ZOOKEEPER-1174:
----------------------------------------

This is a pretty serious bug if you ever wind up in the corner where it is exercised.  That proviso limits the average seriousness of the bug, but file descriptor leaks are never good.

                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100714#comment-13100714 ] 

Ted Dunning commented on ZOOKEEPER-1174:
----------------------------------------

Correct.  Is sockKey needed if we don't register with the selector?

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100624#comment-13100624 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

Sorry, trigger happy about reading your comments. I'm not sure this is true:

Secondly, is it safe to not register sockets that connect immediately? I think, but am not sure, that the answer is yes because we have clearly already called primeConnection().

The documentation for sock.connect seems to indicate that you could return true even in a non-blocking mode:
"If this channel is in non-blocking mode then an invocation of this method initiates a non-blocking connection operation. If the connection is established immediately, as can happen with a local connection, then this method returns true. "



> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103890#comment-13103890 ] 

Patrick Hunt commented on ZOOKEEPER-1174:
-----------------------------------------

If we go wth powermock let's use the mockito variety.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100752#comment-13100752 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

Well, it seems like all of the io-related calls use the sockKey; enableWrite, enableRead, cleanup, doIO. I feel like I'm missing some major fundamental point.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated ZOOKEEPER-1174:
-----------------------------------

    Attachment: ZOOKEEPER-1174.patch

Here is a proposed patch.  There are a few considerations here that merit review.  

First, is it safe to register sockets with a selector after the connect call?  I assert yes because select is level based rather than transition based.

Secondly, is it safe to not register sockets that connect immediately?  I think, but am not sure, that the answer is yes because we have clearly already called primeConnection().

Thirdly, is it OK to not rethrow the io exception from the connect call?  I am not sure here.  The immediate effect is that connection is only attempted at the timeout rate rather than the faster rate specified by some of the delays in the code.  This seems OK at first glance, but other opinions would be nice to have.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Mahadev konar (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115202#comment-13115202 ] 

Mahadev konar edited comment on ZOOKEEPER-1174 at 9/27/11 4:29 AM:
-------------------------------------------------------------------

Wed night my time? :)

                
      was (Author: mahadev):
    Wed night my time?

                  
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Camille Fournier updated ZOOKEEPER-1174:
----------------------------------------

    Attachment: ZOOKEEPER-1174-3.3.patch
    
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174-3.3.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101656#comment-13101656 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

Yeah, I think that sounds like a plan.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Matthias Spycher (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141332#comment-13141332 ] 

Matthias Spycher commented on ZOOKEEPER-1174:
---------------------------------------------

I've run into a problem with this patch (version 3.3.3) on a system (Windows7) where InetAddress.getAllByName(host) returns candidate IPv4 and IPv6 addresses.

The reason is that the IOException caught in SendThread.startConnect() is no longer propagated to the calling run() method. In my logs before the patch I would see:

- Opening socket connection to server localhost/0:0:0:0:0:0:0:1:23233
- Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.SocketException: Address family not supported by protocol family: connect
	at sun.nio.ch.Net.connect(Native Method)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500)
	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1050)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1077)

and now I see:

- Opening socket connection to server localhost/0:0:0:0:0:0:0:1:23233
- Unable to open socket to localhost/0:0:0:0:0:0:0:1:23233
- Client session timed out, have not heard from server in 30002ms for sessionid 0x0, closing socket connection and attempting reconnect

In the former, the exception was caught in the run() method and the startConnect() retried with the IPv4 address, which works fine. In the latter, the client times out waiting for the server instead of retrying.

I would recommend rethrowing the IOException in startConnect() until there's a better way to control the InetAddresses in the client.


                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174-3.3.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115456#comment-13115456 ] 

Ted Dunning commented on ZOOKEEPER-1174:
----------------------------------------

Hmmm... let me try to update the patch.

I don't know when Wednesday night your time actually is (I am traveling to distant lands and am very confused just now).
                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100677#comment-13100677 ] 

Camille Fournier commented on ZOOKEEPER-1174:
---------------------------------------------

--The documentation for sock.connect is exactly what I base my (current) position on. The idea is that if connect returns true, then you don't need to use select to wait for the connection and can proceed immediately with the primeConnection and light up the connection for prime time. It is only if connect returns false that deferred actions are required.--


But where are you setting sockKey then? You're not setting it at all if it returns true immediately on the first time this is called.

> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4
>
>         Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira