You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sugandha Naolekar <su...@gmail.com> on 2009/06/10 13:55:18 UTC

HDFS issues..!

         If I want to make the data transfer fast, then what am I supposed
to do? I want to place the data in HDFS and replicate it in fraction of
seconds. Can that be possible. and How? Placing a 5GB file will take atleast
half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and
then placing it in HDFS would take around 2-3 hours. So, how that time delay
can be avoided..?

         Also, My simply aim is to transfer the data, i.e; dumping the data
into HDFS and gettign it back whenever needed. So, for this, transfer, how
speed can be achieved?
-- 
Regards!
Sugandha

Re: HDFS issues..!

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, Jun 10, 2009 at 4:55 AM, Sugandha Naolekar
<su...@gmail.com>wrote:

>         If I want to make the data transfer fast, then what am I supposed
> to do? I want to place the data in HDFS and replicate it in fraction of
> seconds.


I want to go to France, but it takes 10+ hours to get there from California
on the fastest plane. How can I get there faster?


> Can that be possible. and How? Placing a 5GB file will take atleast
> half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and
> then placing it in HDFS would take around 2-3 hours. So, how that time
> delay
> can be avoided..?
>

HDFS will only replicate as many times as you want it to. The write is also
pipelined. This means that writing a 5G file that is replicated to 3 nodes
is only marginally faster than the same file on 10 nodes, if for some reason
you wanted to set your replication count to 10 (unnecessary for 99.99999% of
use cases)


>
>         Also, My simply aim is to transfer the data, i.e; dumping the data
> into HDFS and gettign it back whenever needed. So, for this, transfer, how
> speed can be achieved?


HDFS isn't magic. You can only write as fast as your disk and network can.
If your disk has 50MB/sec of throughput, you'll probably be limited at
50MB/sec. Expecting much more than this in real life scenarios is
unrealistic.

-Todd