You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Devaraj K (JIRA)" <ji...@apache.org> on 2012/06/14 12:11:42 UTC

[jira] [Commented] (MAPREDUCE-4340) Node Manager leaks socket connections connected to Data Node

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294938#comment-13294938 ] 

Devaraj K commented on MAPREDUCE-4340:
--------------------------------------

I have investigated some things for this issue, it seems to be due to FileSystem Cache.

The Node Manager gets the FileSystem object for copying the files from DFS during localization. 
 

{code:title=FSDownload.java|borderStyle=solid}
  private Path copy(Path sCopy, Path dstdir) throws IOException {
    FileSystem sourceFs = sCopy.getFileSystem(conf);
    Path dCopy = new Path(dstdir, sCopy.getName() + ".tmp");
    FileStatus sStat = sourceFs.getFileStatus(sCopy);
    if (sStat.getModificationTime() != resource.getTimestamp()) {
      throw new IOException("Resource " + sCopy +
          " changed on src filesystem (expected " + resource.getTimestamp() +
          ", was " + sStat.getModificationTime());
    }

    sourceFs.copyToLocalFile(sCopy, dCopy);
    return dCopy;
  }
{code} 

It is using the FileSystem.get(URI uri, Configuration conf) API to get file system instance, and it internally uses cache for file system instances. For next job, FileSystem.Cache.Key is not matching with previous instance key, creating new file system instance again and it is keep on increasing for every job. For every file system instance there is associated DFSClient instance which is holding the datanode socket in socketCache and it is not closing by any one.
                
> Node Manager leaks socket connections connected to Data Node
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-4340
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4340
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Critical
>
> I am running simple wordcount example with default configurations, for every job run it increases one datanode socket connection and it will be there in CLOSE_WAIT state forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira