You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by ma qiang <ma...@gmail.com> on 2008/07/25 06:06:27 UTC

How to copy data in hdfs and hbase from a hadoop cluster into another hadoop cluster?

Hi all,
    I have a large dataset saved in a hadoop cluster, and now I want
to copy these data from this hadoop cluster into another hadoop
cluster,  who can tell me how?
    Thank you very much !
    Best wishes !

maqiang

Re: How to copy data in hdfs and hbase from a hadoop cluster into another hadoop cluster?

Posted by ma qiang <ma...@gmail.com>.
Thank you for your replies.
Thanks very much !



On Fri, Jul 25, 2008 at 12:23 PM, Peeyush Bishnoi
<pe...@yahoo-inc.com> wrote:
> hello Maqiang ,
>
> For data transfer from one hadoop cluster to another hadoop cluster use
> the hadoop distcp command .
>
> $ hadoop distcp -i hdfs://<namenode>:<port no.><source DFS data path>
> hdfs://<namenode>:<port no.><destination DFS data path>
>
> Hope it will solve your problem.
>
> Thanks ,
>
> ---
> Peeyush
>
> On Fri, 2008-07-25 at 12:06 +0800, ma qiang wrote:
>
>> Hi all,
>>     I have a large dataset saved in a hadoop cluster, and now I want
>> to copy these data from this hadoop cluster into another hadoop
>> cluster,  who can tell me how?
>>     Thank you very much !
>>     Best wishes !
>>
>> maqiang
>

Re: How to copy data in hdfs and hbase from a hadoop cluster into another hadoop cluster?

Posted by Peeyush Bishnoi <pe...@yahoo-inc.com>.
hello Maqiang ,

For data transfer from one hadoop cluster to another hadoop cluster use
the hadoop distcp command .

$ hadoop distcp -i hdfs://<namenode>:<port no.><source DFS data path>
hdfs://<namenode>:<port no.><destination DFS data path>

Hope it will solve your problem.

Thanks ,

---
Peeyush

On Fri, 2008-07-25 at 12:06 +0800, ma qiang wrote:

> Hi all,
>     I have a large dataset saved in a hadoop cluster, and now I want
> to copy these data from this hadoop cluster into another hadoop
> cluster,  who can tell me how?
>     Thank you very much !
>     Best wishes !
> 
> maqiang

Re: How to copy data in hdfs and hbase from a hadoop cluster into another hadoop cluster?

Posted by Pratyush Banerjee <pr...@aol.com>.
Hi,

There is a distributed copy utility in hadoop which would allow to copy 
large chunks of data from one dfs to another. The exact syntax for using 
this command is

distcp [OPTIONS] <srcurl>* <desturl>

OPTIONS:
-p[rbugp]              Preserve status
                       r: replication number
                       b: block size
                       u: user
                       g: group
                       p: permission
                       -p alone is equivalent to -prbugp
-i                     Ignore failures
-log <logdir>          Write logs to <logdir>
-overwrite             Overwrite destination
-update                Overwrite if src size different from dst size
-f <urilist_uri>       Use list at <urilist_uri> as src list

NOTE: if -overwrite or -update are set, each source URI is
      interpreted as an isomorphic update to an existing directory.
For example:
hadoop distcp -p -update "hdfs://A:8020/user/foo/bar" 
"hdfs://B:8020/user/foo/baz"

     would update all descendants of 'baz' also in 'bar'; it would
     *not* update /user/foo/baz/bar

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenod
-jt <local|jobtracker:port>    specify a job tracker

This utility utilizes map reduce to copy large chunks of data.

hope this helps.

Pratyush



maqiang1984@gmail.com wrote:
> Hi all,
>     I have a large dataset saved in a hadoop cluster, and now I want
> to copy these data from this hadoop cluster into another hadoop
> cluster,  who can tell me how?
>     Thank you very much !
>     Best wishes !
>
> maqiang
>   


Re: How to copy data in hdfs and hbase from a hadoop cluster into another hadoop cluster?

Posted by Ankur Goel <an...@corp.aol.com>.
The simplest and fastest way is to use 'distcp' command in hadoop shell. 
This will copy all your data in parallel using map-reduce.
The shell command would look something like this

hadoop distcp hdfs://source-namenode-host:port/source-path/ 
hdfs://dest-namenode-host:port/dest-path

Hope this helps

-Ankur

ma qiang wrote:
> Hi all,
>     I have a large dataset saved in a hadoop cluster, and now I want
> to copy these data from this hadoop cluster into another hadoop
> cluster,  who can tell me how?
>     Thank you very much !
>     Best wishes !
>
> maqiang
>