You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2008/03/15 01:38:24 UTC

[jira] Created: (HADOOP-3024) DFSClient should implement some kind of socket pooling

DFSClient should implement some kind of socket pooling
------------------------------------------------------

                 Key: HADOOP-3024
                 URL: https://issues.apache.org/jira/browse/HADOOP-3024
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.17.0
            Reporter: Jim Kellerman


Currently, DFSClient maintains one socket per open file. For most map/reduce operations, this is not a problem because there just aren't many open files.

However, HBase has a very different usage model in which a single region region server could have thousands (10**3 but less than 10**4) open files. This can cause both datanodes and region servers to run out of file handles.

What I would like to see is one connection for each dfsClient, datanode pair. This would reduce the number of connections to hundreds or tens of sockets.

The intent is not to process requests totally asychronously (overlapping block reads and forcing the client to reassemble a whole message out of a bunch of fragments), but rather to queue requests from the client to the datanode and process them serially, differing from the current implementation in that rather than use an exclusive socket for each file, only one socket is in use between the client and a particular datanode.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3024) DFSClient should implement some kind of socket pooling

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579012#action_12579012 ] 

Raghu Angadi commented on HADOOP-3024:
--------------------------------------

Parts of HADOOP-2638 might be relevant here.

> DFSClient should implement some kind of socket pooling
> ------------------------------------------------------
>
>                 Key: HADOOP-3024
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3024
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Jim Kellerman
>
> Currently, DFSClient maintains one socket per open file. For most map/reduce operations, this is not a problem because there just aren't many open files.
> However, HBase has a very different usage model in which a single region region server could have thousands (10**3 but less than 10**4) open files. This can cause both datanodes and region servers to run out of file handles.
> What I would like to see is one connection for each dfsClient, datanode pair. This would reduce the number of connections to hundreds or tens of sockets.
> The intent is not to process requests totally asychronously (overlapping block reads and forcing the client to reassemble a whole message out of a bunch of fragments), but rather to queue requests from the client to the datanode and process them serially, differing from the current implementation in that rather than use an exclusive socket for each file, only one socket is in use between the client and a particular datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3024) HDFS needs to support a very large number of open files.

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-3024:
----------------------------------

    Summary: HDFS needs to support a very large number of open files.  (was: DFSClient should implement some kind of socket pooling)

> HDFS needs to support a very large number of open files.
> --------------------------------------------------------
>
>                 Key: HADOOP-3024
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3024
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Jim Kellerman
>
> Currently, DFSClient maintains one socket per open file. For most map/reduce operations, this is not a problem because there just aren't many open files.
> However, HBase has a very different usage model in which a single region region server could have thousands (10**3 but less than 10**4) open files. This can cause both datanodes and region servers to run out of file handles.
> What I would like to see is one connection for each dfsClient, datanode pair. This would reduce the number of connections to hundreds or tens of sockets.
> The intent is not to process requests totally asychronously (overlapping block reads and forcing the client to reassemble a whole message out of a bunch of fragments), but rather to queue requests from the client to the datanode and process them serially, differing from the current implementation in that rather than use an exclusive socket for each file, only one socket is in use between the client and a particular datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.