You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mridul Muralidharan <mr...@yahoo-inc.com> on 2010/05/18 02:10:35 UTC
distcp of small number of really large files
Hi,
Is there a way to parallelize copy of really large files ?
From my understanding, currently a each map in distcp copies one file.
So for really large files, this would be pretty slow if number of files
is really large.
Thanks,
Mridul
Re: distcp of small number of really large files
Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
There is a new feature called concat(), which concatenates files consisting of full blocks.
So the ideas is to copy individual blocks in parallel, then concatenate them once they are
copied back into original files.
You will have to write some code to do this or modify distcp.
This is in 0.22/21, but not in 0.20.
--Konstantin
On 5/17/2010 5:10 PM, Mridul Muralidharan wrote:
> Hi,
>
> Is there a way to parallelize copy of really large files ?
> From my understanding, currently a each map in distcp copies one file.
> So for really large files, this would be pretty slow if number of files
> is really large.
>
>
> Thanks,
> Mridul
>