You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sugandha Naolekar <su...@gmail.com> on 2009/06/10 08:26:18 UTC

HDFS data transfer!

Hello!

If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
cluster) into HDFS, and get it back, how much time is it supposed to take?

No map-reduce involved. Simply Writing files in and out from HDFS through a
simple code of java (usage of API's).

-- 
Regards!
Sugandha

Re: HDFS data transfer!

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Thanks Brian for the good advice.

Slightly off topic from original post: there will be occasions where it 
is necessary or better to copy different portions of a file in parallel 
(distcp can benefit a lot). There is a proposal to let HDFS 'stitch' 
multiple files into one: something like

NameNode.stitchFiles(Path to, Path[] files)

This way a very large file can be copied more efficiently (with a 
map/red job, for e.g). Another use case is for high latency and high 
bandwidth connections (like coast-to-coast). High latency can be some 
what worked around by using large buffers for tcp connections, but 
usually users don't have that control. It is just simpler to use 
multiple connections.

This will obviously be HDFS only interface (i.e. not a FileSystem 
method) at least initially.

Raghu.

Brian Bockelman wrote:
> Hey Sugandha,
> 
> Transfer rates depend on the quality/quantity of your hardware and the 
> quality of your client disk that is generating the data.  I usually say 
> that you should expect near-hardware-bottleneck speeds for an otherwise 
> idle cluster.
> 
> There should be no "make it fast" required (though you should reviewi 
> the logs for errors if it's going slow).  I would expect a 5GB file to 
> take around 3-5 minutes to write on our cluster, but it's a well-tuned 
> and operational cluster.
> 
> As Todd (I think) mentioned before, we can't help any when you say "I 
> want to make it faster".  You need to provide diagnostic information - 
> logs, Ganglia plots, stack traces, something - that folks can look at.
> 
> Brian
> 
> On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:
> 
>> But if I want to make it fast, then??? I want to place the data in 
>> HDFS and
>> reoplicate it in fraction of seconds. Can that be possible. and How?
>>
>> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <ka...@gmail.com> 
>> wrote:
>>
>>> I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
>>> file.
>>> Secura
>>>
>>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
>>> <su...@gmail.com>wrote:It
>>>
>>>> Hello!
>>>>
>>>> If I try to transfer a 5GB VDI file from a remote host(not a part of
>>> hadoop
>>>> cluster) into HDFS, and get it back, how much time is it supposed to
>>> take?
>>>>
>>>> No map-reduce involved. Simply Writing files in and out from HDFS 
>>>> through
>>> a
>>>> simple code of java (usage of API's).
>>>>
>>>> -- 
>>>> Regards!
>>>> Sugandha
>>>>
>>>
>>
>>
>>
>> -- 
>> Regards!
>> Sugandha
> 


Re: HDFS data transfer!

Posted by jason hadoop <ja...@gmail.com>.
Also check the IO wait time on your datanodes, if the io wait time is high,
you can't win.

On Fri, Jun 12, 2009 at 11:24 AM, Brian Bockelman <bb...@cse.unl.edu>wrote:

> What's your replication factor?  What aggregate I/O rates do you see in
> Ganglia?  Is the I/O spikey, or has it plateaued?
>
> We can hit close to network rate (1Gbps) per node locally, and have pretty
> similar hardware.
>
> Brian
>
>
> On Jun 12, 2009, at 9:03 AM, Scott wrote:
>
>  I ran the put command on 3 of the nodes simultaneously to copy files that
>> were local on those machines into the hdfs.
>>
>> Brian Bockelman wrote:
>>
>>> What'd you do for the tests?  Was it a single stream or a multiple stream
>>> test?
>>>
>>> Brian
>>>
>>> On Jun 12, 2009, at 6:48 AM, Scott wrote:
>>>
>>>  So is ~ 1GB/minute transfer rate a reasonable performance benchmark?
>>>>  Our test cluster consists of 4 quad core xeon machines with 2 non-raided
>>>> drives each.  My initial tests show a transfer rate of around 1GB/minute,
>>>> and that was slower that I expected it to be.
>>>>
>>>> Thanks,
>>>> Scott
>>>>
>>>>
>>>> Brian Bockelman wrote:
>>>>
>>>>> Hey Sugandha,
>>>>>
>>>>> Transfer rates depend on the quality/quantity of your hardware and the
>>>>> quality of your client disk that is generating the data.  I usually say that
>>>>> you should expect near-hardware-bottleneck speeds for an otherwise idle
>>>>> cluster.
>>>>>
>>>>> There should be no "make it fast" required (though you should reviewi
>>>>> the logs for errors if it's going slow).  I would expect a 5GB file to take
>>>>> around 3-5 minutes to write on our cluster, but it's a well-tuned and
>>>>> operational cluster.
>>>>>
>>>>> As Todd (I think) mentioned before, we can't help any when you say "I
>>>>> want to make it faster".  You need to provide diagnostic information - logs,
>>>>> Ganglia plots, stack traces, something - that folks can look at.
>>>>>
>>>>> Brian
>>>>>
>>>>> On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:
>>>>>
>>>>>  But if I want to make it fast, then??? I want to place the data in
>>>>>> HDFS and
>>>>>> reoplicate it in fraction of seconds. Can that be possible. and How?
>>>>>>
>>>>>> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <ka...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  I would suppose about 2-3 hours. It took me some 2 days to load a 160
>>>>>>> Gb
>>>>>>> file.
>>>>>>> Secura
>>>>>>>
>>>>>>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
>>>>>>> <su...@gmail.com>wrote:It
>>>>>>>
>>>>>>>  Hello!
>>>>>>>>
>>>>>>>> If I try to transfer a 5GB VDI file from a remote host(not a part of
>>>>>>>>
>>>>>>> hadoop
>>>>>>>
>>>>>>>> cluster) into HDFS, and get it back, how much time is it supposed to
>>>>>>>>
>>>>>>> take?
>>>>>>>
>>>>>>>>
>>>>>>>> No map-reduce involved. Simply Writing files in and out from HDFS
>>>>>>>> through
>>>>>>>>
>>>>>>> a
>>>>>>>
>>>>>>>> simple code of java (usage of API's).
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards!
>>>>>>>> Sugandha
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards!
>>>>>> Sugandha
>>>>>>
>>>>>
>>>>>
>>>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Re: HDFS data transfer!

Posted by Brian Bockelman <bb...@cse.unl.edu>.
What's your replication factor?  What aggregate I/O rates do you see  
in Ganglia?  Is the I/O spikey, or has it plateaued?

We can hit close to network rate (1Gbps) per node locally, and have  
pretty similar hardware.

Brian

On Jun 12, 2009, at 9:03 AM, Scott wrote:

> I ran the put command on 3 of the nodes simultaneously to copy files  
> that were local on those machines into the hdfs.
>
> Brian Bockelman wrote:
>> What'd you do for the tests?  Was it a single stream or a multiple  
>> stream test?
>>
>> Brian
>>
>> On Jun 12, 2009, at 6:48 AM, Scott wrote:
>>
>>> So is ~ 1GB/minute transfer rate a reasonable performance  
>>> benchmark?  Our test cluster consists of 4 quad core xeon machines  
>>> with 2 non-raided drives each.  My initial tests show a transfer  
>>> rate of around 1GB/minute, and that was slower that I expected it  
>>> to be.
>>>
>>> Thanks,
>>> Scott
>>>
>>>
>>> Brian Bockelman wrote:
>>>> Hey Sugandha,
>>>>
>>>> Transfer rates depend on the quality/quantity of your hardware  
>>>> and the quality of your client disk that is generating the data.   
>>>> I usually say that you should expect near-hardware-bottleneck  
>>>> speeds for an otherwise idle cluster.
>>>>
>>>> There should be no "make it fast" required (though you should  
>>>> reviewi the logs for errors if it's going slow).  I would expect  
>>>> a 5GB file to take around 3-5 minutes to write on our cluster,  
>>>> but it's a well-tuned and operational cluster.
>>>>
>>>> As Todd (I think) mentioned before, we can't help any when you  
>>>> say "I want to make it faster".  You need to provide diagnostic  
>>>> information - logs, Ganglia plots, stack traces, something - that  
>>>> folks can look at.
>>>>
>>>> Brian
>>>>
>>>> On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:
>>>>
>>>>> But if I want to make it fast, then??? I want to place the data  
>>>>> in HDFS and
>>>>> reoplicate it in fraction of seconds. Can that be possible. and  
>>>>> How?
>>>>>
>>>>> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik.sxn@gmail.com 
>>>>> > wrote:
>>>>>
>>>>>> I would suppose about 2-3 hours. It took me some 2 days to load  
>>>>>> a 160 Gb
>>>>>> file.
>>>>>> Secura
>>>>>>
>>>>>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
>>>>>> <su...@gmail.com>wrote:It
>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> If I try to transfer a 5GB VDI file from a remote host(not a  
>>>>>>> part of
>>>>>> hadoop
>>>>>>> cluster) into HDFS, and get it back, how much time is it  
>>>>>>> supposed to
>>>>>> take?
>>>>>>>
>>>>>>> No map-reduce involved. Simply Writing files in and out from  
>>>>>>> HDFS through
>>>>>> a
>>>>>>> simple code of java (usage of API's).
>>>>>>>
>>>>>>> -- 
>>>>>>> Regards!
>>>>>>> Sugandha
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Regards!
>>>>> Sugandha
>>>>
>>


Re: HDFS data transfer!

Posted by Scott <sk...@weather.com>.
I ran the put command on 3 of the nodes simultaneously to copy files 
that were local on those machines into the hdfs.

Brian Bockelman wrote:
> What'd you do for the tests?  Was it a single stream or a multiple 
> stream test?
>
> Brian
>
> On Jun 12, 2009, at 6:48 AM, Scott wrote:
>
>> So is ~ 1GB/minute transfer rate a reasonable performance benchmark?  
>> Our test cluster consists of 4 quad core xeon machines with 2 
>> non-raided drives each.  My initial tests show a transfer rate of 
>> around 1GB/minute, and that was slower that I expected it to be.
>>
>> Thanks,
>> Scott
>>
>>
>> Brian Bockelman wrote:
>>> Hey Sugandha,
>>>
>>> Transfer rates depend on the quality/quantity of your hardware and 
>>> the quality of your client disk that is generating the data.  I 
>>> usually say that you should expect near-hardware-bottleneck speeds 
>>> for an otherwise idle cluster.
>>>
>>> There should be no "make it fast" required (though you should 
>>> reviewi the logs for errors if it's going slow).  I would expect a 
>>> 5GB file to take around 3-5 minutes to write on our cluster, but 
>>> it's a well-tuned and operational cluster.
>>>
>>> As Todd (I think) mentioned before, we can't help any when you say 
>>> "I want to make it faster".  You need to provide diagnostic 
>>> information - logs, Ganglia plots, stack traces, something - that 
>>> folks can look at.
>>>
>>> Brian
>>>
>>> On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:
>>>
>>>> But if I want to make it fast, then??? I want to place the data in 
>>>> HDFS and
>>>> reoplicate it in fraction of seconds. Can that be possible. and How?
>>>>
>>>> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena 
>>>> <ka...@gmail.com> wrote:
>>>>
>>>>> I would suppose about 2-3 hours. It took me some 2 days to load a 
>>>>> 160 Gb
>>>>> file.
>>>>> Secura
>>>>>
>>>>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
>>>>> <su...@gmail.com>wrote:It
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> If I try to transfer a 5GB VDI file from a remote host(not a part of
>>>>> hadoop
>>>>>> cluster) into HDFS, and get it back, how much time is it supposed to
>>>>> take?
>>>>>>
>>>>>> No map-reduce involved. Simply Writing files in and out from HDFS 
>>>>>> through
>>>>> a
>>>>>> simple code of java (usage of API's).
>>>>>>
>>>>>> -- 
>>>>>> Regards!
>>>>>> Sugandha
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Regards!
>>>> Sugandha
>>>
>

Re: HDFS data transfer!

Posted by Brian Bockelman <bb...@cse.unl.edu>.
What'd you do for the tests?  Was it a single stream or a multiple  
stream test?

Brian

On Jun 12, 2009, at 6:48 AM, Scott wrote:

> So is ~ 1GB/minute transfer rate a reasonable performance  
> benchmark?  Our test cluster consists of 4 quad core xeon machines  
> with 2 non-raided drives each.  My initial tests show a transfer  
> rate of around 1GB/minute, and that was slower that I expected it to  
> be.
>
> Thanks,
> Scott
>
>
> Brian Bockelman wrote:
>> Hey Sugandha,
>>
>> Transfer rates depend on the quality/quantity of your hardware and  
>> the quality of your client disk that is generating the data.  I  
>> usually say that you should expect near-hardware-bottleneck speeds  
>> for an otherwise idle cluster.
>>
>> There should be no "make it fast" required (though you should  
>> reviewi the logs for errors if it's going slow).  I would expect a  
>> 5GB file to take around 3-5 minutes to write on our cluster, but  
>> it's a well-tuned and operational cluster.
>>
>> As Todd (I think) mentioned before, we can't help any when you say  
>> "I want to make it faster".  You need to provide diagnostic  
>> information - logs, Ganglia plots, stack traces, something - that  
>> folks can look at.
>>
>> Brian
>>
>> On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:
>>
>>> But if I want to make it fast, then??? I want to place the data in  
>>> HDFS and
>>> reoplicate it in fraction of seconds. Can that be possible. and How?
>>>
>>> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena  
>>> <ka...@gmail.com> wrote:
>>>
>>>> I would suppose about 2-3 hours. It took me some 2 days to load a  
>>>> 160 Gb
>>>> file.
>>>> Secura
>>>>
>>>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
>>>> <su...@gmail.com>wrote:It
>>>>
>>>>> Hello!
>>>>>
>>>>> If I try to transfer a 5GB VDI file from a remote host(not a  
>>>>> part of
>>>> hadoop
>>>>> cluster) into HDFS, and get it back, how much time is it  
>>>>> supposed to
>>>> take?
>>>>>
>>>>> No map-reduce involved. Simply Writing files in and out from  
>>>>> HDFS through
>>>> a
>>>>> simple code of java (usage of API's).
>>>>>
>>>>> -- 
>>>>> Regards!
>>>>> Sugandha
>>>>>
>>>>
>>>
>>>
>>>
>>> -- 
>>> Regards!
>>> Sugandha
>>


Re: HDFS data transfer!

Posted by Scott <sk...@weather.com>.
So is ~ 1GB/minute transfer rate a reasonable performance benchmark?  
Our test cluster consists of 4 quad core xeon machines with 2 non-raided 
drives each.  My initial tests show a transfer rate of around 
1GB/minute, and that was slower that I expected it to be.

Thanks,
Scott


Brian Bockelman wrote:
> Hey Sugandha,
>
> Transfer rates depend on the quality/quantity of your hardware and the 
> quality of your client disk that is generating the data.  I usually 
> say that you should expect near-hardware-bottleneck speeds for an 
> otherwise idle cluster.
>
> There should be no "make it fast" required (though you should reviewi 
> the logs for errors if it's going slow).  I would expect a 5GB file to 
> take around 3-5 minutes to write on our cluster, but it's a well-tuned 
> and operational cluster.
>
> As Todd (I think) mentioned before, we can't help any when you say "I 
> want to make it faster".  You need to provide diagnostic information - 
> logs, Ganglia plots, stack traces, something - that folks can look at.
>
> Brian
>
> On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:
>
>> But if I want to make it fast, then??? I want to place the data in 
>> HDFS and
>> reoplicate it in fraction of seconds. Can that be possible. and How?
>>
>> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <ka...@gmail.com> 
>> wrote:
>>
>>> I would suppose about 2-3 hours. It took me some 2 days to load a 
>>> 160 Gb
>>> file.
>>> Secura
>>>
>>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
>>> <su...@gmail.com>wrote:It
>>>
>>>> Hello!
>>>>
>>>> If I try to transfer a 5GB VDI file from a remote host(not a part of
>>> hadoop
>>>> cluster) into HDFS, and get it back, how much time is it supposed to
>>> take?
>>>>
>>>> No map-reduce involved. Simply Writing files in and out from HDFS 
>>>> through
>>> a
>>>> simple code of java (usage of API's).
>>>>
>>>> -- 
>>>> Regards!
>>>> Sugandha
>>>>
>>>
>>
>>
>>
>> -- 
>> Regards!
>> Sugandha
>

Re: HDFS data transfer!

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Sugandha,

Transfer rates depend on the quality/quantity of your hardware and the  
quality of your client disk that is generating the data.  I usually  
say that you should expect near-hardware-bottleneck speeds for an  
otherwise idle cluster.

There should be no "make it fast" required (though you should reviewi  
the logs for errors if it's going slow).  I would expect a 5GB file to  
take around 3-5 minutes to write on our cluster, but it's a well-tuned  
and operational cluster.

As Todd (I think) mentioned before, we can't help any when you say "I  
want to make it faster".  You need to provide diagnostic information -  
logs, Ganglia plots, stack traces, something - that folks can look at.

Brian

On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

> But if I want to make it fast, then??? I want to place the data in  
> HDFS and
> reoplicate it in fraction of seconds. Can that be possible. and How?
>
> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena  
> <ka...@gmail.com> wrote:
>
>> I would suppose about 2-3 hours. It took me some 2 days to load a  
>> 160 Gb
>> file.
>> Secura
>>
>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
>> <su...@gmail.com>wrote:It
>>
>>> Hello!
>>>
>>> If I try to transfer a 5GB VDI file from a remote host(not a part of
>> hadoop
>>> cluster) into HDFS, and get it back, how much time is it supposed to
>> take?
>>>
>>> No map-reduce involved. Simply Writing files in and out from HDFS  
>>> through
>> a
>>> simple code of java (usage of API's).
>>>
>>> --
>>> Regards!
>>> Sugandha
>>>
>>
>
>
>
> -- 
> Regards!
> Sugandha


Re: HDFS data transfer!

Posted by Sugandha Naolekar <su...@gmail.com>.
But if I want to make it fast, then??? I want to place the data in HDFS and
reoplicate it in fraction of seconds. Can that be possible. and How?

On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <ka...@gmail.com> wrote:

> I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
> file.
> Secura
>
> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
> <su...@gmail.com>wrote:It
>
> > Hello!
> >
> > If I try to transfer a 5GB VDI file from a remote host(not a part of
> hadoop
> > cluster) into HDFS, and get it back, how much time is it supposed to
> take?
> >
> > No map-reduce involved. Simply Writing files in and out from HDFS through
> a
> > simple code of java (usage of API's).
> >
> > --
> > Regards!
> > Sugandha
> >
>



-- 
Regards!
Sugandha

Re: HDFS data transfer!

Posted by kartik saxena <ka...@gmail.com>.
I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
file.
Secura

On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
<su...@gmail.com>wrote:It

> Hello!
>
> If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
> cluster) into HDFS, and get it back, how much time is it supposed to take?
>
> No map-reduce involved. Simply Writing files in and out from HDFS through a
> simple code of java (usage of API's).
>
> --
> Regards!
> Sugandha
>