You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/10/05 08:21:44 UTC

[jira] Created: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Hadoop triggers a "soft" fd leak.
----------------------------------

Key: HADOOP-4346
URL: https://issues.apache.org/jira/browse/HADOOP-4346
Project: Hadoop Core
Issue Type: Bug
Components: io
Affects Versions: 0.17.0
Reporter: Raghu Angadi

Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked.

If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.

Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector.

Koji helped a lot in tracking this. Some sections from 'jmap' output and other info Koji collected led to this suspicion and will include that in the next comment.

One solution might be to handle connect() also in Hadoop using our selectors.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

    Status: Patch Available  (was: Open)

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637162#action_12637162 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------

The following shows relevant info from jmap for a datanode that had a lot fds open.

- {noformat}
#jmap with out full-GC. Includes stale objects:
# num of fds for the process : 5358

#java internal selectors
 117:          1780          42720  sun.nio.ch.Util$SelectorWrapper
 118:          1762          42288  sun.nio.ch.Util$SelectorWrapper$Closer

#Hadoop selectors
  93:          3026         121040  org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$SelectorInfo
 844:             1             40  org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$ProviderInfo

#Datanode threads 
  99:          2229         106992  org.apache.hadoop.dfs.DataNode$DataXceiver
{noformat}

- {noformat}
#jmap -histo:live immediately after the previous. This does a full-GC before counting.
#num of fds : 5187

  64:          1759          42216  sun.nio.ch.Util$SelectorWrapper
  65:          1759          42216  sun.nio.ch.Util$SelectorWrapper$Closer

 465:             4            160  org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$SelectorInfo
 772:             1             40  org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$ProviderInfo

 422:             4            192  org.apache.hadoop.dfs.DataNode$DataXceiver
{noformat} This shows that there is no fd leak in Hadoop's selector cache. DN has 4 threads doing I/O and there are 4 selectors. But there are a lot of java internal selectors open.

- {noformat}
# 'jmap -histo:live' bout 1 minute after the previous full-GC
#num of fds : 57

# There are no SelectorWrapper objects. All of these must have been closed.

 768:             1             40  org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$SelectorInfo

 730:             1             48  org.apache.hadoop.dfs.DataNode$DataXceiver
{noformat}

I will try to reproduced this myself and try out a patch for connect(). 


> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated HADOOP-4346:
----------------------------------

    Attachment: HADOOP-4346-0.18.3.patch

Reuploading patch that works with -p0 instead of -p1

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated HADOOP-4346:
----------------------------------

    Attachment: HADOOP-4346-0.18.3.patch

Fixed -p0 / -p1 issue in patch (again...)

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-0.18.3.patch, HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643501#action_12643501 ] 

Hudson commented on HADOOP-4346:
--------------------------------

Integrated in Hadoop-trunk #646 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/646/])
    . Implement blocking connect so that Hadoop is not affected
by selector problem with JDK default implementation. (Raghu Angadi)


> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638123#action_12638123 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------

Could someone review this patch for trunk? 

Eventually we should write a Socket a socket factory and return our own Socket so that we don't need NetUtils.connect(), NetUtils.getOuptStream() etc.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637527#action_12637527 ] 

Allen Wittenauer commented on HADOOP-4346:
------------------------------------------

I guess our datanodes aren't that busy then... ;)

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643351#action_12643351 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4346:
------------------------------------------------

- In case SelectionKey.OP_WRITE, "read" should be "write"

- I suggest to simplify the codes for timing as below
{code}
final long endtime = System.currentTimeMillis() + timeout;
while(true) {
  final long timeleft = endtime - System.currentTimeMillis();
  ...
  if (... && timeleft < 0) {
    throw ...
  }
}
{code}

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

    Attachment: HADOOP-4346.patch


Suggest fix is attached. Datanodes have no instances of Sun's SelectorWrapper objects with this patch. 

This adds {{NetUtils.connect(socket, addr, timeout)}} which invokes {{SocketIOWithTimeout.connect()}} for channels.

connect seems to work fine. If you anyone wants to try this patch out, I can attach a version for 0.17 or 0.18. 

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643410#action_12643410 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------

tes-patch output : {noformat}
     [exec] -1 overall.

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
{noformat}

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

    Attachment: HADOOP-4346.patch

Updated patch fixes two findbugs warnings. 'SocketChannel' is is already a SelectableChannel, there is no need to check with instanceof.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676114#action_12676114 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------

> The lines near 1490 do not make any reference to 'proxySock'. 

Right, looks like that part got reverted in 0.18 [later|http://svn.apache.org/viewvc?view=rev&revision=708637].

0.18 patch I attached does not have some minor fixes that were done later.

I suggest following change for your patch : changes for SocketIOWtihTimeout.java and NetUtils.java should come from trunk patch (they should just apply without any conflicts). The rest of it should be fine.

We should run 'ant test-patch' with new one.. I can run that if you want.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676121#action_12676121 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------

You resolution is correct. targetSock.connect() should be changed instead of proxySock.

Essentially, DataNode should not invoke plain {{socket.connect()}} at all, they should all be replaced with {{NetUtils.connect(sock...)}}

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637330#action_12637330 ] 

Allen Wittenauer commented on HADOOP-4346:
------------------------------------------

Bryan: out of curiosity, what is your upper limit on FD's?  In theory, you should be able to work around this by increasing the fd limit.  (rlim_fd_max in /etc/sytsem on solaris, nofile in /etc/security/limits.conf on Linux, ...)

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637386#action_12637386 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------

I doubt 8K would be enough for a moderately busy datanode with out this patch. Ultimately it depends on what goes on in blackbox of GC and phantom references.. it might be fine for some users or some loads.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Ankur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638515#action_12638515 ] 

ankur edited comment on HADOOP-4346 at 10/10/08 5:54 AM:
---------------------------------------------------------

That would explain the kind of file handle leaks we are observing when writing from Apache HTTp server via a custom logwriter to HDFS. The log-writer opens up a file for writing every X min and keeps writing the apache log entries that it receives through a pipe. Underneath the log writer the DFSClient opens up a new blocking connection (DFSClient.connect()) and transfers the data via data streamer threads. In our case X is sufficiently large to cause the wrapped selector objects to move to "old space" and not get garbage collected because a full GC never runs since the program never exceeds its memory requirements.

I think having this patch will alleviate the problem. Another way for client applications is to force full GC at regular intervals. I will try forcing full GC and see if that works for my case.

      was (Author: ankur):
    That would explain the kind of file handle leaks we are observing when writing from Apache HTTp server via a custom logwriter to HDFS. The log-writer opens up a file for writing every X min and keeps writing the apache log entries that it receives through a pipe. Underneath the log writer the DFSClient opens up a new blocking connection (DFSClient.connect()) and transfers the data via data streamer threads. In our case X is sufficiently large to cause the wrapped selector objects to move to "old space" and not get garbage collected because a full GC never runs since the program never exceeds its memory requirements.

I think having this patch will alleviate the problem. Another way for client applications is to force full GC at regular intervals. Will forcing full GC and see if that works for my case.
  
> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637539#action_12637539 ] 

Koji Noguchi commented on HADOOP-4346:
--------------------------------------

bq. I guess our datanodes aren't that busy then

It also depends on the heapsize which would change the frequency of fullGC.
We use default heapsize of 1000M for the datanode.
Cluster that hit this issue was using 2048M (with 1024 file descriptor limit)


> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676139#action_12676139 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------


> When I grepped, I noticed that we don't fix connect in Balancer.java. We probably should. I will file jira on that.

not  an issue. Balancer does not use NIO sockets, so not affected. sorry for the confusion. 

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-0.18.3.patch, HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676137#action_12676137 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------

+1. looks correct.  

When I grepped, I noticed that we don't fix connect in Balancer.java. We probably should. I will file jira on that.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-0.18.3.patch, HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Ankur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638515#action_12638515 ] 

Ankur commented on HADOOP-4346:
-------------------------------

That would explain the kind of file handle leaks we are observing when writing from Apache HTTp server via a custom logwriter to HDFS. The log-writer opens up a file for writing every X min and keeps writing the apache log entries that it receives through a pipe. Underneath the log writer the DFSClient opens up a new blocking connection (DFSClient.connect()) and transfers the data via data streamer threads. In our case X is sufficiently large to cause the wrapped selector objects to move to "old space" and not get garbage collected because a full GC never runs since the program never exceeds its memory requirements.

I think having this patch will alleviate the problem. Another way for client applications is to force full GC at regular intervals. Will forcing full GC and see if that works for my case.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated HADOOP-4346:
----------------------------------

    Attachment: HADOOP-4346-0.18.3.patch

Updated 0.18.3 patch per Raghu's suggestions

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

    Attachment: HADOOP-4346.patch

Thanks Nicholas.

Updated patch is attached.

> In case SelectionKey.OP_WRITE, "read" should be "write"
good catch. 

> I suggest to simplify the codes for timing as below 
I modified the loop a bit like the way you suggested, but not exactly same, since it requires two calls to currentTime() instead of one for each loop.. even in the common case of no timeouts. 
  




> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676118#action_12676118 ] 

Aaron Kimball commented on HADOOP-4346:
---------------------------------------

Thanks Raghu,

So what's the resolution on DataNode.java? Just disregard the non-applying hunk? If so, what about the targetSock.connect() call at line 1431?


> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated HADOOP-4346:
----------------------------------

    Attachment: HADOOP-4346-0.18.3.patch

My proposed change to the patch is attached as HADOOP-4346-0.18.3.patch, but I'm not sure if it's correct.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

    Attachment: HADOOP-4346.patch

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637340#action_12637340 ] 

Bryan Duxbury commented on HADOOP-4346:
---------------------------------------

Originally I had 1024 per machine. I recently increased it to 4096 and I haven't seen the problem as much, but it was intermittent before, so I don't know if it's resolved or not.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated HADOOP-4346:
----------------------------------

    Attachment:     (was: HADOOP-4346-0.18.3.patch)

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-0.18.3.patch, HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

    Attachment: HADOOP-4346-branch-18.patch

Patch for 0.18 is attached.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637347#action_12637347 ] 

Allen Wittenauer commented on HADOOP-4346:
------------------------------------------

We're running ours with 8192, which is likely why we've never seen this. :)

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

    Attachment: HADOOP-4346.patch

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643359#action_12643359 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4346:
------------------------------------------------

+1 patch looks good.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637297#action_12637297 ] 

Bryan Duxbury commented on HADOOP-4346:
---------------------------------------

I'd love to try this out on 0.18.1. Sounds like my exact problem.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-4346:
---------------------------------

    Attachment: HADOOP-4346.patch

Updated patch closes the channel as javadoc for SocketChannel.connect() requires.

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi reassigned HADOOP-4346:
------------------------------------

    Assignee: Raghu Angadi

> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4346) Hadoop triggers a "soft" fd leak.

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676105#action_12676105 ] 

Aaron Kimball commented on HADOOP-4346:
---------------------------------------

The -branch-18 patch does not apply to Hadoop 0.18.3. Specifically, the following hunk fails to apply in DataNode.java:

{code}
*** 1490,1496 ****
          InetSocketAddress proxyAddr = NetUtils.createSocketAddr(
              proxySource.getName());
          proxySock = newSocket();
-         proxySock.connect(proxyAddr, socketTimeout);
          proxySock.setSoTimeout(socketTimeout);

          OutputStream baseStream = NetUtils.getOutputStream(proxySock,
{code}

The lines near 1490 do not make any reference to 'proxySock'. 

The only definitions of "OutputStream baseStream" are in run(), copyBlock(), and readBlock(). Without more context, for this patch hunk, I'm not sure where this might be applied, though the line:

{code}
targetSock.connect(targetAddr, socketTimeout);
{code}

near line 1431 seems like a reasonable candidate, as it's the only example of a fooSock.Connect() call. 

Raghu, am I correct here? If not, can you release a new 18-branch patch?


> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4346-branch-18.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch, HADOOP-4346.patch
>
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO channels. Normal blocking reads and writes are handled by Hadoop and use our own cache of selectors. This cache suites well for Hadoop where I/O often occurs on many short lived threads. Number of fds consumed is proportional to number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. Looks like this cleaning is kind of like finalizers and tied to GC. This is pretty ill suited if we have many threads that are short lived. Until GC happens, number of these selectors keeps growing. Each selector consumes 3 fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and other info  Koji collected led to this suspicion and will include that in the next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.