You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Taeho Kang <tk...@gmail.com> on 2009/02/02 03:48:03 UTC

Transferring data between different Hadoop clusters

Dear all,

There have been times where I needed to transfer some big data from one
version of Hadoop cluster to another.
(e.g. from hadoop 0.18 to hadoop 0.19 cluster)

Other than copying files from one cluster to a local file system and upload
it to another,
is there a tool that does it?

Thanks in advance,
Regards,

/Taeho

Re: Transferring data between different Hadoop clusters

Posted by Taeho Kang <tk...@gmail.com>.
Sorry to bug you guys, I should've looked up the wiki pages.

I've found a wiki page about distcp and it contains all the answers for my
questions. :-)
http://hadoop.apache.org/core/docs/current/distcp.html

Thanks once again.



On Tue, Feb 3, 2009 at 10:33 AM, Taeho Kang <tk...@gmail.com> wrote:

> Thanks for your prompt reply.
>
> When using the command
>  "./bin/hadoop distcp hftp://cluster1:50070/path hdfs://cluster2/path"
>
> - Should this command be given in cluster1?
> - What does port "50070" specify? Is it the one in "fs.default.name", or
> "dfs.http.address"?
>
> /Taeho
>
>
>
> On Mon, Feb 2, 2009 at 12:40 PM, Mark Chadwick <mc...@invitemedia.com>wrote:
>
>> Taeho,
>>
>> The distcp command is perfect for this.  If you're copying between two
>> clusters running the same version of Hadoop, you can do something like:
>>
>> ./bin/hadoop distcp hdfs://cluster1/path hdfs://cluster2/path
>>
>> If you're copying between 0.18 and 0.19, the command will look like:
>>
>> ./bin/hadoop distcp hftp://cluster1:50070/path hdfs://cluster2/path
>>
>> Hope that helps,
>> -Mark
>>
>> On Sun, Feb 1, 2009 at 9:48 PM, Taeho Kang <tk...@gmail.com> wrote:
>>
>> > Dear all,
>> >
>> > There have been times where I needed to transfer some big data from one
>> > version of Hadoop cluster to another.
>> > (e.g. from hadoop 0.18 to hadoop 0.19 cluster)
>> >
>> > Other than copying files from one cluster to a local file system and
>> upload
>> > it to another,
>> > is there a tool that does it?
>> >
>> > Thanks in advance,
>> > Regards,
>> >
>> > /Taeho
>> >
>>
>
>

Re: Transferring data between different Hadoop clusters

Posted by Taeho Kang <tk...@gmail.com>.
Thanks for your prompt reply.

When using the command
"./bin/hadoop distcp hftp://cluster1:50070/path hdfs://cluster2/path"

- Should this command be given in cluster1?
- What does port "50070" specify? Is it the one in "fs.default.name", or
"dfs.http.address"?

/Taeho



On Mon, Feb 2, 2009 at 12:40 PM, Mark Chadwick <mc...@invitemedia.com>wrote:

> Taeho,
>
> The distcp command is perfect for this.  If you're copying between two
> clusters running the same version of Hadoop, you can do something like:
>
> ./bin/hadoop distcp hdfs://cluster1/path hdfs://cluster2/path
>
> If you're copying between 0.18 and 0.19, the command will look like:
>
> ./bin/hadoop distcp hftp://cluster1:50070/path hdfs://cluster2/path
>
> Hope that helps,
> -Mark
>
> On Sun, Feb 1, 2009 at 9:48 PM, Taeho Kang <tk...@gmail.com> wrote:
>
> > Dear all,
> >
> > There have been times where I needed to transfer some big data from one
> > version of Hadoop cluster to another.
> > (e.g. from hadoop 0.18 to hadoop 0.19 cluster)
> >
> > Other than copying files from one cluster to a local file system and
> upload
> > it to another,
> > is there a tool that does it?
> >
> > Thanks in advance,
> > Regards,
> >
> > /Taeho
> >
>

Re: Transferring data between different Hadoop clusters

Posted by Mark Chadwick <mc...@invitemedia.com>.
Taeho,

The distcp command is perfect for this.  If you're copying between two
clusters running the same version of Hadoop, you can do something like:

./bin/hadoop distcp hdfs://cluster1/path hdfs://cluster2/path

If you're copying between 0.18 and 0.19, the command will look like:

./bin/hadoop distcp hftp://cluster1:50070/path hdfs://cluster2/path

Hope that helps,
-Mark

On Sun, Feb 1, 2009 at 9:48 PM, Taeho Kang <tk...@gmail.com> wrote:

> Dear all,
>
> There have been times where I needed to transfer some big data from one
> version of Hadoop cluster to another.
> (e.g. from hadoop 0.18 to hadoop 0.19 cluster)
>
> Other than copying files from one cluster to a local file system and upload
> it to another,
> is there a tool that does it?
>
> Thanks in advance,
> Regards,
>
> /Taeho
>