You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "LN (JIRA)" <ji...@apache.org> on 2008/07/17 11:51:31 UTC

[jira] Created: (HADOOP-3779) limit concurrent connections(data serving thread) in one datanode

limit concurrent connections(data serving thread) in one datanode
-----------------------------------------------------------------

                 Key: HADOOP-3779
                 URL: https://issues.apache.org/jira/browse/HADOOP-3779
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.17.1
            Reporter: LN
            Priority: Minor


i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening mapfiles cause datanode OOME(stack memory), because 2000+ data serving threads in datanode process.

although HADOOP-2346 has implements timeouts, it will be some situation many connection created  before the read timeout(default 6min) reach. like hbase does, it open all files on regionserver startup. 

limit concurrent connections(data serving thread) will make datanode more stable. and i think it could be done in SocketIOWithTimeout$SelectorPool#select:
1. in SelectorPool#select, record all waiting SelectorInfo instances in a List at the beginning, and remove it after 'Selector#select' done.
2. before real 'select',  do a limitation check, if reached, close the first selectorInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-3779) limit concurrent connections(data serving thread) in one datanode

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614888#action_12614888 ] 

rangadi edited comment on HADOOP-3779 at 7/18/08 2:47 PM:
---------------------------------------------------------------

yes, in fact you may not like like the 256 limitation at all.

In any case, if you just want to close any client connection that is idle (for say 1 sec), that needs to be handled at the DataNode level and not at SelectorPool. SelectorPool is an implementation detail of a utility to do blocking IO with NIO sockets.  From your brief description, your suggested fix does not seem like some thing very useful and is at wrong level (kind of like writing a kernel module to close an idle socket :) ) . May be a detailed description or better a simple prototype implementation will make it more clear.

Note that we need to rewrite data transfer code paths in DataNode to do real async transfer (network transfers are easy, but datanode needs to do disk I/O). I would say sooner or later DataNode needs to do that.. it can not continue to live with one thread per connection.

I am thinking of proposing a design for "async data transfers" if there is enough interest.  Basic idea is to share a pool of threads (we need a pool to do disk I/O) to handle all the clients transfers.. something like 5 or so per disk. This requires substantial rewrite of readBlock() and writeBlock() code paths in Datanode.

      was (Author: rangadi):
    yes, in fact you may not like like the 256 limitation at all.

In any case, if you just want to close any client connection that is idle (for say 1 sec), that needs to be handled at the DataNode level and not at SelectorPool. SelectorPool is an implementation detail of a utility to do blocking IO with NIO sockets.  From your brief description, your suggested fix does not seem like some thing very useful and is at wrong level (kind of like writing a kernel module to close an idle socket :) ) . May be a detailed description or better a simple prototype implementation will make it more clear.

Note that we need to rewrite data transfer code paths in DataNode to do real async transfer (network transfers are easy, but datanode needs to do disk I/O). I would sooner or later DataNode needs to do that.. it can not continue to live with one thread per connection.

I am thinking of proposing a design for "async data transfers" if there is enough interest.  Basic idea is to share a pool of threads (we need a pool to do disk I/O) to handle all the clients transfers.. something like 5 or so per disk. This requires substantial rewrite of readBlock() and writeBlock() code paths in Datanode.
  
> limit concurrent connections(data serving thread) in one datanode
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3779
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3779
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.1
>            Reporter: LN
>            Priority: Minor
>
> i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening mapfiles cause datanode OOME(stack memory), because 2000+ data serving threads in datanode process.
> although HADOOP-2346 has implements timeouts, it will be some situation many connection created  before the read timeout(default 6min) reach. like hbase does, it open all files on regionserver startup. 
> limit concurrent connections(data serving thread) will make datanode more stable. and i think it could be done in SocketIOWithTimeout$SelectorPool#select:
> 1. in SelectorPool#select, record all waiting SelectorInfo instances in a List at the beginning, and remove it after 'Selector#select' done.
> 2. before real 'select',  do a limitation check, if reached, close the first selectorInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-3779) limit concurrent connections(data serving thread) in one datanode

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614582#action_12614582 ] 

rangadi edited comment on HADOOP-3779 at 7/17/08 5:25 PM:
---------------------------------------------------------------

Couple of things :
* HADOOP-2346 has nothing to with number of threads.  #threads is same as before. If anything it makes it easier to move to using a thread pool.
* After HADOOP-3633, DN limits number simultaneous threads to 256. Is that what you want? Let us know if that works for you.

      was (Author: rangadi):
    Couple of things :
* HADOOP-2346 has nothing to with number of threads. Its same as before. If anything it makes it easier to move to using a thread pool.
* After HADOOP-3633, DN limits number simultaneous threads to 256. Is that what you want? Let us know if that works for you.
  
> limit concurrent connections(data serving thread) in one datanode
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3779
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3779
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.1
>            Reporter: LN
>            Priority: Minor
>
> i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening mapfiles cause datanode OOME(stack memory), because 2000+ data serving threads in datanode process.
> although HADOOP-2346 has implements timeouts, it will be some situation many connection created  before the read timeout(default 6min) reach. like hbase does, it open all files on regionserver startup. 
> limit concurrent connections(data serving thread) will make datanode more stable. and i think it could be done in SocketIOWithTimeout$SelectorPool#select:
> 1. in SelectorPool#select, record all waiting SelectorInfo instances in a List at the beginning, and remove it after 'Selector#select' done.
> 2. before real 'select',  do a limitation check, if reached, close the first selectorInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3779) limit concurrent connections(data serving thread) in one datanode

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614888#action_12614888 ] 

Raghu Angadi commented on HADOOP-3779:
--------------------------------------

yes, in fact you may not like like the 256 limitation at all.

In any case, if you just want to close any client connection that is idle (for say 1 sec), that needs to be handled at the DataNode level and not at SelectorPool. SelectorPool is an implementation detail of a utility to do blocking IO with NIO sockets.  From your brief description, your suggested fix does not seem like some thing very useful and is at wrong level (kind of like writing a kernel module to close an idle socket :) ) . May be a detailed description or better a simple prototype implementation will make it more clear.

Note that we need to rewrite data transfer code paths in DataNode to do real async transfer (network transfers are easy, but datanode needs to do disk I/O). I would sooner or later DataNode needs to do that.. it can not continue to live with one thread per connection.

I am thinking of proposing a design for "async data transfers" if there is enough interest.  Basic idea is to share a pool of threads (we need a pool to do disk I/O) to handle all the clients transfers.. something like 5 or so per disk. This requires substantial rewrite of readBlock() and writeBlock() code paths in Datanode.

> limit concurrent connections(data serving thread) in one datanode
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3779
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3779
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.1
>            Reporter: LN
>            Priority: Minor
>
> i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening mapfiles cause datanode OOME(stack memory), because 2000+ data serving threads in datanode process.
> although HADOOP-2346 has implements timeouts, it will be some situation many connection created  before the read timeout(default 6min) reach. like hbase does, it open all files on regionserver startup. 
> limit concurrent connections(data serving thread) will make datanode more stable. and i think it could be done in SocketIOWithTimeout$SelectorPool#select:
> 1. in SelectorPool#select, record all waiting SelectorInfo instances in a List at the beginning, and remove it after 'Selector#select' done.
> 2. before real 'select',  do a limitation check, if reached, close the first selectorInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3779) limit concurrent connections(data serving thread) in one datanode

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614582#action_12614582 ] 

Raghu Angadi commented on HADOOP-3779:
--------------------------------------

Couple of things :
* HADOOP-2346 has nothing to with number of threads. Its same as before. If anything it makes it easier to move to using a thread pool.
* After HADOOP-3633, DN limits number simultaneous threads to 256. Is that what you want? Let us know if that works for you.

> limit concurrent connections(data serving thread) in one datanode
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3779
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3779
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.1
>            Reporter: LN
>            Priority: Minor
>
> i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening mapfiles cause datanode OOME(stack memory), because 2000+ data serving threads in datanode process.
> although HADOOP-2346 has implements timeouts, it will be some situation many connection created  before the read timeout(default 6min) reach. like hbase does, it open all files on regionserver startup. 
> limit concurrent connections(data serving thread) will make datanode more stable. and i think it could be done in SocketIOWithTimeout$SelectorPool#select:
> 1. in SelectorPool#select, record all waiting SelectorInfo instances in a List at the beginning, and remove it after 'Selector#select' done.
> 2. before real 'select',  do a limitation check, if reached, close the first selectorInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3779) limit concurrent connections(data serving thread) in one datanode

Posted by "LN (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614784#action_12614784 ] 

LN commented on HADOOP-3779:
----------------------------

Raghu: i don't think HADOOP-3633 is my situation, i am running hbase on hadoop, which keeps MapFile.Reader open.(stack described more in HADOOP-2341),  thousands Reader may open as hbase(regionserver) process running, but only few of them performing io in one second, so the datanode is not overloading(need refusing following requests), as HADOOP-3633 focusing on. 

write timeout in HADOOP-2346 is helpful for this issue, idle connections(default 8min) closed(will reopen by DFSClient transparently), but not enough, that's why this issue opened.



> limit concurrent connections(data serving thread) in one datanode
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3779
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3779
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.1
>            Reporter: LN
>            Priority: Minor
>
> i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening mapfiles cause datanode OOME(stack memory), because 2000+ data serving threads in datanode process.
> although HADOOP-2346 has implements timeouts, it will be some situation many connection created  before the read timeout(default 6min) reach. like hbase does, it open all files on regionserver startup. 
> limit concurrent connections(data serving thread) will make datanode more stable. and i think it could be done in SocketIOWithTimeout$SelectorPool#select:
> 1. in SelectorPool#select, record all waiting SelectorInfo instances in a List at the beginning, and remove it after 'Selector#select' done.
> 2. before real 'select',  do a limitation check, if reached, close the first selectorInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.