You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Sanjay Radia <sr...@yahoo-inc.com> on 2008/03/12 19:35:49 UTC

Re: Multiplexing sockets in DFSClient/datanodes?

Doug Cutting wrote:
> Jim Kellerman wrote:
>> Yes, multiplexing a socket is more complicated than having one socket
>> per file, but saving system resources seems like a way to scale.
>>
>> Questions? Comments? Opinions? Flames?
>
> Note that Hadoop RPC already multiplexes, sharing a single socket per 
> pair of JVMs.  It would be possible to multiplex datanode, and should 
> not in theory significantly impact performance, but, as you indicate, 
> it would be a significant change.  One approach might be to implement 
> HDFS data access using RPC rather than directly using stream i/o.
>
> RPC also tears down idle connections, which HDFS does not.  I wonder 
> how much doing that alone might help your case?  That would probably 
> be much simpler to implement.  Both client and server must already 
> handle connection failures, so it shouldn't be too great of a change 
> to have one or both sides actively close things down if they're idle 
> for more than a few seconds.  This is related to adding write timeouts 
> to the datanode (HADOOP-2346).

Doug,
   Dhruba and I had discussed using RPC in the past. While RPC is a 
cleaner interface and our rpc implementation has
features such sharing connection, closing idle connections etc,  
streaming IO lets to pipe large amounts
of data without the request/response exchange.
The worry was that IO performance would degrade.
BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)

sanjay
>
> Doug

Re: Multiplexing sockets in DFSClient/datanodes?

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

There are many resources consumed by an open dfs file : fds, sockets, 
socket buffers, threads ... etc.

Better questions to consider might be "How do we support very large 
number of open files in HDFS"?, which, I think opens it up to different 
types of solutions than one. And "what compromises (if there are any) 
are ok to achieve this?".

I know its a serious problem for HBase and every fix, incremental or 
not, helps. Having a short write write timeout on DataNode in current 
trunk will help greatly with DataNode side (threads and sockets). Of 
course, we need to make write timeout configurable, which is trivial.

One connection between every client and datanode might not as scalable 
on a large cluster. Say the cluster has 3000 datanodes and a client has 
5000 files open to essentially to random datanodes. Then the number of 
connections from client is still in thousands (same problem as now).

Raghu.

dhruba Borthakur wrote:
> Hi Jim,
> 
> Oh, I see. This does not sound too difficult. One can use the connection
> pooling code from the RPC layer. The DFS Client can use the pool to
> cache open connections.  Also, I assumed that this connection pooling is
> enabled only for block reads and not for block writes.
> 
> Would you like to open a JIRA so that we can discuss it in more detail?
> 
> Thanks,
> dhruba
> 
> -----Original Message-----
> From: Jim Kellerman [mailto:jim@powerset.com] 
> Sent: Friday, March 14, 2008 1:01 PM
> To: core-dev@hadoop.apache.org; hadoop-dev@lucene.apache.org
> Subject: RE: Multiplexing sockets in DFSClient/datanodes?
> 
> I'm not suggesting doing simultaneous transfers, just having one
> connection between any one client and any one data node. My thinking was
> each transfer would be queued and then processed one at a time.
> 
> This is a big problem for us. On our cluster at Powerset, we have had
> both datanodes and HBase region servers run out of file handles because
> there is one open per file.
> 
> As HBase installations get larger one socket per file just won't scale.
> 
> ---
> Jim Kellerman, Senior Engineer; Powerset
> 
> 
>> -----Original Message-----
>> From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
>> Sent: Friday, March 14, 2008 10:53 AM
>> To: core-dev@hadoop.apache.org; hadoop-dev@lucene.apache.org
>> Subject: RE: Multiplexing sockets in DFSClient/datanodes?
>>
>> Hi Jim,
>>
>> The protocol between the client and the Datanodes will become
>> relatively more complex if we decide to multiplex
>> simultaneous transfers of multiple blocks on the same socket
>> connection. Do you think that the benefit of saving on system
>> resources is really appreciable?
>>
>> Thanks,
>> Dhruba
>>
>> -----Original Message-----
>> From: Sanjay Radia [mailto:sradia@yahoo-inc.com]
>> Sent: Wednesday, March 12, 2008 11:36 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: Multiplexing sockets in DFSClient/datanodes?
>>
>> Doug Cutting wrote:
>>> Jim Kellerman wrote:
>>>> Yes, multiplexing a socket is more complicated than having
>> one socket
>>>> per file, but saving system resources seems like a way to scale.
>>>>
>>>> Questions? Comments? Opinions? Flames?
>>> Note that Hadoop RPC already multiplexes, sharing a single
>> socket per
>>> pair of JVMs.  It would be possible to multiplex datanode,
>> and should
>>> not in theory significantly impact performance, but, as you
>> indicate,
>>> it would be a significant change.  One approach might be to
>> implement
>>> HDFS data access using RPC rather than directly using stream i/o.
>>>
>>> RPC also tears down idle connections, which HDFS does not.
>> I wonder
>>> how much doing that alone might help your case?  That would
>> probably
>>> be much simpler to implement.  Both client and server must already
>>> handle connection failures, so it shouldn't be too great of
>> a change
>>> to have one or both sides actively close things down if
>> they're idle
>>> for more than a few seconds.  This is related to adding
>> write timeouts
>>
>>> to the datanode (HADOOP-2346).
>> Doug,
>>    Dhruba and I had discussed using RPC in the past. While
>> RPC is a cleaner interface and our rpc implementation has
>> features such sharing connection, closing idle connections
>> etc, streaming IO lets to pipe large amounts of data without
>> the request/response exchange.
>> The worry was that IO performance would degrade.
>> BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
>>
>> sanjay
>>> Doug
>>
>> No virus found in this incoming message.
>> Checked by AVG.
>> Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release
>> Date: 3/14/2008 12:33 PM
>>
>>
> 
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release Date:
> 3/14/2008 12:33 PM
>

RE: Multiplexing sockets in DFSClient/datanodes?

Posted by dhruba Borthakur <dh...@yahoo-inc.com>.

Hi Jim,

Oh, I see. This does not sound too difficult. One can use the connection
pooling code from the RPC layer. The DFS Client can use the pool to
cache open connections.  Also, I assumed that this connection pooling is
enabled only for block reads and not for block writes.

Would you like to open a JIRA so that we can discuss it in more detail?

Thanks,
dhruba

-----Original Message-----
From: Jim Kellerman [mailto:jim@powerset.com] 
Sent: Friday, March 14, 2008 1:01 PM
To: core-dev@hadoop.apache.org; hadoop-dev@lucene.apache.org
Subject: RE: Multiplexing sockets in DFSClient/datanodes?

I'm not suggesting doing simultaneous transfers, just having one
connection between any one client and any one data node. My thinking was
each transfer would be queued and then processed one at a time.

This is a big problem for us. On our cluster at Powerset, we have had
both datanodes and HBase region servers run out of file handles because
there is one open per file.

As HBase installations get larger one socket per file just won't scale.

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
> Sent: Friday, March 14, 2008 10:53 AM
> To: core-dev@hadoop.apache.org; hadoop-dev@lucene.apache.org
> Subject: RE: Multiplexing sockets in DFSClient/datanodes?
>
> Hi Jim,
>
> The protocol between the client and the Datanodes will become
> relatively more complex if we decide to multiplex
> simultaneous transfers of multiple blocks on the same socket
> connection. Do you think that the benefit of saving on system
> resources is really appreciable?
>
> Thanks,
> Dhruba
>
> -----Original Message-----
> From: Sanjay Radia [mailto:sradia@yahoo-inc.com]
> Sent: Wednesday, March 12, 2008 11:36 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: Multiplexing sockets in DFSClient/datanodes?
>
> Doug Cutting wrote:
> > Jim Kellerman wrote:
> >> Yes, multiplexing a socket is more complicated than having
> one socket
> >> per file, but saving system resources seems like a way to scale.
> >>
> >> Questions? Comments? Opinions? Flames?
> >
> > Note that Hadoop RPC already multiplexes, sharing a single
> socket per
> > pair of JVMs.  It would be possible to multiplex datanode,
> and should
> > not in theory significantly impact performance, but, as you
> indicate,
> > it would be a significant change.  One approach might be to
> implement
> > HDFS data access using RPC rather than directly using stream i/o.
> >
> > RPC also tears down idle connections, which HDFS does not.
> I wonder
> > how much doing that alone might help your case?  That would
> probably
> > be much simpler to implement.  Both client and server must already
> > handle connection failures, so it shouldn't be too great of
> a change
> > to have one or both sides actively close things down if
> they're idle
> > for more than a few seconds.  This is related to adding
> write timeouts
>
> > to the datanode (HADOOP-2346).
>
> Doug,
>    Dhruba and I had discussed using RPC in the past. While
> RPC is a cleaner interface and our rpc implementation has
> features such sharing connection, closing idle connections
> etc, streaming IO lets to pipe large amounts of data without
> the request/response exchange.
> The worry was that IO performance would degrade.
> BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
>
> sanjay
> >
> > Doug
>
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release
> Date: 3/14/2008 12:33 PM
>
>

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release Date:
3/14/2008 12:33 PM

RE: Multiplexing sockets in DFSClient/datanodes?

Posted by Jim Kellerman <ji...@powerset.com>.

I'm not suggesting doing simultaneous transfers, just having one connection between any one client and any one data node. My thinking was each transfer would be queued and then processed one at a time.

This is a big problem for us. On our cluster at Powerset, we have had both datanodes and HBase region servers run out of file handles because there is one open per file.

As HBase installations get larger one socket per file just won't scale.

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
> Sent: Friday, March 14, 2008 10:53 AM
> To: core-dev@hadoop.apache.org; hadoop-dev@lucene.apache.org
> Subject: RE: Multiplexing sockets in DFSClient/datanodes?
>
> Hi Jim,
>
> The protocol between the client and the Datanodes will become
> relatively more complex if we decide to multiplex
> simultaneous transfers of multiple blocks on the same socket
> connection. Do you think that the benefit of saving on system
> resources is really appreciable?
>
> Thanks,
> Dhruba
>
> -----Original Message-----
> From: Sanjay Radia [mailto:sradia@yahoo-inc.com]
> Sent: Wednesday, March 12, 2008 11:36 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: Multiplexing sockets in DFSClient/datanodes?
>
> Doug Cutting wrote:
> > Jim Kellerman wrote:
> >> Yes, multiplexing a socket is more complicated than having
> one socket
> >> per file, but saving system resources seems like a way to scale.
> >>
> >> Questions? Comments? Opinions? Flames?
> >
> > Note that Hadoop RPC already multiplexes, sharing a single
> socket per
> > pair of JVMs.  It would be possible to multiplex datanode,
> and should
> > not in theory significantly impact performance, but, as you
> indicate,
> > it would be a significant change.  One approach might be to
> implement
> > HDFS data access using RPC rather than directly using stream i/o.
> >
> > RPC also tears down idle connections, which HDFS does not.
> I wonder
> > how much doing that alone might help your case?  That would
> probably
> > be much simpler to implement.  Both client and server must already
> > handle connection failures, so it shouldn't be too great of
> a change
> > to have one or both sides actively close things down if
> they're idle
> > for more than a few seconds.  This is related to adding
> write timeouts
>
> > to the datanode (HADOOP-2346).
>
> Doug,
>    Dhruba and I had discussed using RPC in the past. While
> RPC is a cleaner interface and our rpc implementation has
> features such sharing connection, closing idle connections
> etc, streaming IO lets to pipe large amounts of data without
> the request/response exchange.
> The worry was that IO performance would degrade.
> BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
>
> sanjay
> >
> > Doug
>
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release
> Date: 3/14/2008 12:33 PM
>
>

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release Date: 3/14/2008 12:33 PM

RE: Multiplexing sockets in DFSClient/datanodes?

Posted by dhruba Borthakur <dh...@yahoo-inc.com>.

Hi Jim,

The protocol between the client and the Datanodes will become relatively
more complex if we decide to multiplex simultaneous transfers of
multiple blocks on the same socket connection. Do you think that the
benefit of saving on system resources is really appreciable?

Thanks,
Dhruba

-----Original Message-----
From: Sanjay Radia [mailto:sradia@yahoo-inc.com] 
Sent: Wednesday, March 12, 2008 11:36 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: Multiplexing sockets in DFSClient/datanodes?

Doug Cutting wrote:
> Jim Kellerman wrote:
>> Yes, multiplexing a socket is more complicated than having one socket
>> per file, but saving system resources seems like a way to scale.
>>
>> Questions? Comments? Opinions? Flames?
>
> Note that Hadoop RPC already multiplexes, sharing a single socket per 
> pair of JVMs.  It would be possible to multiplex datanode, and should 
> not in theory significantly impact performance, but, as you indicate, 
> it would be a significant change.  One approach might be to implement 
> HDFS data access using RPC rather than directly using stream i/o.
>
> RPC also tears down idle connections, which HDFS does not.  I wonder 
> how much doing that alone might help your case?  That would probably 
> be much simpler to implement.  Both client and server must already 
> handle connection failures, so it shouldn't be too great of a change 
> to have one or both sides actively close things down if they're idle 
> for more than a few seconds.  This is related to adding write timeouts

> to the datanode (HADOOP-2346).

Doug,
   Dhruba and I had discussed using RPC in the past. While RPC is a 
cleaner interface and our rpc implementation has
features such sharing connection, closing idle connections etc,  
streaming IO lets to pipe large amounts
of data without the request/response exchange.
The worry was that IO performance would degrade.
BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)

sanjay
>
> Doug

Re: Multiplexing sockets in DFSClient/datanodes?

Posted by Sanjay Radia <sr...@yahoo-inc.com>.

Hairong Kuang wrote:
>> streaming IO lets to pipe large amounts
>> of data without the request/response exchange.
>> The worry was that IO performance would degrade.
>>     
>
> Since hadoop-2188 removes ipc timeout, it is ok that a datanode responses a
> datanode up in the pipeline when it gets a response from a datanode down in
> the pipeline. If datanodes could have two threads, one pushing data down to
> the pipeline and one writing it to the local disk, using RPC won't introduce
> any additional communication cost.
>   

I believe that is what our pipe line code does.
The client, however will block for the reply unless we change the client 
code to have multiple buffers etc.
> Hairong
>
> On 3/12/08 11:35 AM, "Sanjay Radia" <sr...@yahoo-inc.com> wrote:
>
>   
>> Doug Cutting wrote:
>>     
>>> Jim Kellerman wrote:
>>>       
>>>> Yes, multiplexing a socket is more complicated than having one socket
>>>> per file, but saving system resources seems like a way to scale.
>>>>
>>>> Questions? Comments? Opinions? Flames?
>>>>         
>>> Note that Hadoop RPC already multiplexes, sharing a single socket per
>>> pair of JVMs.  It would be possible to multiplex datanode, and should
>>> not in theory significantly impact performance, but, as you indicate,
>>> it would be a significant change.  One approach might be to implement
>>> HDFS data access using RPC rather than directly using stream i/o.
>>>
>>> RPC also tears down idle connections, which HDFS does not.  I wonder
>>> how much doing that alone might help your case?  That would probably
>>> be much simpler to implement.  Both client and server must already
>>> handle connection failures, so it shouldn't be too great of a change
>>> to have one or both sides actively close things down if they're idle
>>> for more than a few seconds.  This is related to adding write timeouts
>>> to the datanode (HADOOP-2346).
>>>       
>> Doug,
>>    Dhruba and I had discussed using RPC in the past. While RPC is a
>> cleaner interface and our rpc implementation has
>> features such sharing connection, closing idle connections etc,
>> streaming IO lets to pipe large amounts
>> of data without the request/response exchange.
>> The worry was that IO performance would degrade.
>> BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
>>
>> sanjay
>>     
>>> Doug
>>>       
>
>

Re: Multiplexing sockets in DFSClient/datanodes?

Posted by Hairong Kuang <ha...@yahoo-inc.com>.

> streaming IO lets to pipe large amounts
> of data without the request/response exchange.
> The worry was that IO performance would degrade.

Since hadoop-2188 removes ipc timeout, it is ok that a datanode responses a
datanode up in the pipeline when it gets a response from a datanode down in
the pipeline. If datanodes could have two threads, one pushing data down to
the pipeline and one writing it to the local disk, using RPC won't introduce
any additional communication cost.

Hairong

On 3/12/08 11:35 AM, "Sanjay Radia" <sr...@yahoo-inc.com> wrote:

> Doug Cutting wrote:
>> Jim Kellerman wrote:
>>> Yes, multiplexing a socket is more complicated than having one socket
>>> per file, but saving system resources seems like a way to scale.
>>> 
>>> Questions? Comments? Opinions? Flames?
>> 
>> Note that Hadoop RPC already multiplexes, sharing a single socket per
>> pair of JVMs.  It would be possible to multiplex datanode, and should
>> not in theory significantly impact performance, but, as you indicate,
>> it would be a significant change.  One approach might be to implement
>> HDFS data access using RPC rather than directly using stream i/o.
>> 
>> RPC also tears down idle connections, which HDFS does not.  I wonder
>> how much doing that alone might help your case?  That would probably
>> be much simpler to implement.  Both client and server must already
>> handle connection failures, so it shouldn't be too great of a change
>> to have one or both sides actively close things down if they're idle
>> for more than a few seconds.  This is related to adding write timeouts
>> to the datanode (HADOOP-2346).
> 
> Doug,
>    Dhruba and I had discussed using RPC in the past. While RPC is a
> cleaner interface and our rpc implementation has
> features such sharing connection, closing idle connections etc,
> streaming IO lets to pipe large amounts
> of data without the request/response exchange.
> The worry was that IO performance would degrade.
> BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
> 
> sanjay
>> 
>> Doug
>