You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Matthias Spycher (Commented) (JIRA)" <ji...@apache.org> on 2011/11/01 18:09:32 UTC

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141332#comment-13141332 ] 

Matthias Spycher commented on ZOOKEEPER-1174:
---------------------------------------------

I've run into a problem with this patch (version 3.3.3) on a system (Windows7) where InetAddress.getAllByName(host) returns candidate IPv4 and IPv6 addresses.

The reason is that the IOException caught in SendThread.startConnect() is no longer propagated to the calling run() method. In my logs before the patch I would see:

- Opening socket connection to server localhost/0:0:0:0:0:0:0:1:23233
- Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.SocketException: Address family not supported by protocol family: connect
	at sun.nio.ch.Net.connect(Native Method)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500)
	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1050)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1077)

and now I see:

- Opening socket connection to server localhost/0:0:0:0:0:0:0:1:23233
- Unable to open socket to localhost/0:0:0:0:0:0:0:1:23233
- Client session timed out, have not heard from server in 30002ms for sessionid 0x0, closing socket connection and attempting reconnect

In the former, the exception was caught in the run() method and the startConnect() retried with the IPv4 address, which works fine. In the latter, the client times out waiting for the server instead of retrying.

I would recommend rethrowing the IOException in startConnect() until there's a better way to control the InetAddresses in the client.


                
> FD leak when network unreachable
> --------------------------------
>
>                 Key: ZOOKEEPER-1174
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>            Priority: Critical
>             Fix For: 3.3.4, 3.4.0, 3.5.0
>
>         Attachments: ZOOKEEPER-1174-3.3.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174fix.patch, zk-fd-leak.tgz
>
>
> In the socket connection logic there are several errors that result in bad behavior.  The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with.  First, the socket may connect immediately.  Secondly, the connect may throw an exception.  In either of these two cases, I don't think that the socket should be registered.
> I will attach a test case that demonstrates the problem.  I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so.  It would still be good to do so if somebody can figure out a good way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira