You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Daryn Sharp (JIRA)" <ji...@apache.org> on 2012/06/07 17:40:23 UTC

[jira] [Commented] (MAPREDUCE-4323) NM leaks sockets

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291078#comment-13291078 ] 

Daryn Sharp commented on MAPREDUCE-4323:
----------------------------------------

In particular, {{DFSClient}} maintains a socket cache.  Closed sockets are not detected until another connection is needed, or the client is closed.  That's another issue, but the NM's failure to close filesystems for a user after the app completes causes a leak of sockets in the CLOSE_WAIT state that eventually exhaust fds for the process.

Calling {{FileSystem.closeAllForUGI}}, as the JT does, is troublesome that it may close the fs for other apps running as that user.  One approach is to partition the fs cache to allow each app to maintain its own cache of filesystems.  See HADOOP-8490 for possible approaches, which would allow the closing of the app's filesystems ala the JT.

Also note that failure to close filesystems causes all future jobs to use the configuration of the first job.  This will be very problematic, so it's imperative to ensure apps each get their own cached instances.
                
> NM leaks sockets
> ----------------
>
>                 Key: MAPREDUCE-4323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.0, 0.24.0, 2.0.0-alpha
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> The NM is exhausting its fds because it's not closing fs instances when the app is finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira