You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Bala Natarajan (JIRA)" <ji...@apache.org> on 2014/12/23 23:59:14 UTC

[jira] [Commented] (MESOS-1560) Libprocess SocketManager drains ephemeral ports when trying to connect to a dead remote socket

    [ https://issues.apache.org/jira/browse/MESOS-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257653#comment-14257653 ] 

Bala Natarajan commented on MESOS-1560:
---------------------------------------

I see that socket close() is not invoked which seems to be the problem where the client side runs out of sockets. I understand the code is trying to handle the last reference of the Socket instance.  I have tried this (coupling the socket lifetime with object's life time around it) in the past and failed (leading to resource leak), like the one reported here. An easier approach would be to close the socket, and mark the object as unusable. Since send() and recv() come through the socket manager, it is probably much easier to introspect the state of the object around it and fail send() and recv().  Is this the direction that you would like to see this addressed?

> Libprocess SocketManager drains ephemeral ports when trying to connect to a dead remote socket
> ----------------------------------------------------------------------------------------------
>
>                 Key: MESOS-1560
>                 URL: https://issues.apache.org/jira/browse/MESOS-1560
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>            Priority: Minor
>
> Currently SocketManager allocates a new ephemeral port when it tries to communicate with a remote socket if one active socket is not created and kept alive for that remote socket.
> When the remote socket closes (e.g. Scheduler tries to talk to master and master terminates), SocketManager closes the local socket. However since the local port stays in TIME_WAIT state for a while,  in extreme cases where the local SocketManager keeps retrying to connect to that dead remote socket *too fast*, it can cause the local machine to run out of available ephemeral ports.
> Can these ports in TIME_WAIT be *reused* by SocketManager to avoid the problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)