You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by xeon Mailinglist <xe...@gmail.com> on 2015/02/02 10:12:20 UTC

Copy data between clusters during the job execution.

Hi

I want to have a job that copies the map output, or the reduce output to
another hdfs. Is is possible?

E.g., the job runs in cluster 1 and takes the input from this cluster.
Then, before the job finishes, it copies the map output or the reduce
output to the hdfs in the cluster 2.

Thanks,

RE: Copy data between clusters during the job execution.

Posted by ha...@visolve.com.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is :

 

Distcp (solution to your problem)

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

It is also wise to use latest tool: 

distcp2

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node. 

 

Optional:

If you need, you can provide multiple directories using “\”  option the command

 

Thanks and Regards, 
S.RagavendraGanesh 
Hadoop Support Team 
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com

 

 

From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.

 

take a look at oozie, once first job completes you can distcp to another server.

Artem Ervits

On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <ma...@gmail.com> > wrote:

It should run after your job finishes.

You can create the flow using a simple bash script

Daniel


On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <ma...@gmail.com> > wrote:

But can I use discp inside my job, or I need to program something that executes distcp after executing my job?



On 02-02-2015 10:20, Daniel Haviv wrote:

an use distcp

 

Daniel

 

On 2 בפבר׳ 2015, at 11:12, 

 


RE: Copy data between clusters during the job execution.

Posted by ha...@visolve.com.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is :

 

Distcp (solution to your problem)

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

It is also wise to use latest tool: 

distcp2

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node. 

 

Optional:

If you need, you can provide multiple directories using “\”  option the command

 

Thanks and Regards, 
S.RagavendraGanesh 
Hadoop Support Team 
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com

 

 

From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.

 

take a look at oozie, once first job completes you can distcp to another server.

Artem Ervits

On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <ma...@gmail.com> > wrote:

It should run after your job finishes.

You can create the flow using a simple bash script

Daniel


On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <ma...@gmail.com> > wrote:

But can I use discp inside my job, or I need to program something that executes distcp after executing my job?



On 02-02-2015 10:20, Daniel Haviv wrote:

an use distcp

 

Daniel

 

On 2 בפבר׳ 2015, at 11:12, 

 


RE: Copy data between clusters during the job execution.

Posted by ha...@visolve.com.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is :

 

Distcp (solution to your problem)

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

It is also wise to use latest tool: 

distcp2

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node. 

 

Optional:

If you need, you can provide multiple directories using “\”  option the command

 

Thanks and Regards, 
S.RagavendraGanesh 
Hadoop Support Team 
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com

 

 

From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.

 

take a look at oozie, once first job completes you can distcp to another server.

Artem Ervits

On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <ma...@gmail.com> > wrote:

It should run after your job finishes.

You can create the flow using a simple bash script

Daniel


On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <ma...@gmail.com> > wrote:

But can I use discp inside my job, or I need to program something that executes distcp after executing my job?



On 02-02-2015 10:20, Daniel Haviv wrote:

an use distcp

 

Daniel

 

On 2 בפבר׳ 2015, at 11:12, 

 


RE: Copy data between clusters during the job execution.

Posted by ha...@visolve.com.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is :

 

Distcp (solution to your problem)

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

It is also wise to use latest tool: 

distcp2

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node. 

 

Optional:

If you need, you can provide multiple directories using “\”  option the command

 

Thanks and Regards, 
S.RagavendraGanesh 
Hadoop Support Team 
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com

 

 

From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.

 

take a look at oozie, once first job completes you can distcp to another server.

Artem Ervits

On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <ma...@gmail.com> > wrote:

It should run after your job finishes.

You can create the flow using a simple bash script

Daniel


On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <ma...@gmail.com> > wrote:

But can I use discp inside my job, or I need to program something that executes distcp after executing my job?



On 02-02-2015 10:20, Daniel Haviv wrote:

an use distcp

 

Daniel

 

On 2 בפבר׳ 2015, at 11:12, 

 


Re: Copy data between clusters during the job execution.

Posted by Artem Ervits <ar...@gmail.com>.
take a look at oozie, once first job completes you can distcp to another
server.

Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <da...@gmail.com> wrote:

> It should run after your job finishes.
> You can create the flow using a simple bash script
>
> Daniel
>
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com>
> wrote:
>
> But can I use discp inside my job, or I need to program something that
> executes distcp after executing my job?
>
>
> On 02-02-2015 10:20, Daniel Haviv wrote:
>
>  an use distcp
>
> Daniel
>
>  On 2 בפבר׳ 2015, at 11:12,
>
>
>

Re: Copy data between clusters during the job execution.

Posted by Artem Ervits <ar...@gmail.com>.
take a look at oozie, once first job completes you can distcp to another
server.

Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <da...@gmail.com> wrote:

> It should run after your job finishes.
> You can create the flow using a simple bash script
>
> Daniel
>
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com>
> wrote:
>
> But can I use discp inside my job, or I need to program something that
> executes distcp after executing my job?
>
>
> On 02-02-2015 10:20, Daniel Haviv wrote:
>
>  an use distcp
>
> Daniel
>
>  On 2 בפבר׳ 2015, at 11:12,
>
>
>

Re: Copy data between clusters during the job execution.

Posted by Artem Ervits <ar...@gmail.com>.
take a look at oozie, once first job completes you can distcp to another
server.

Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <da...@gmail.com> wrote:

> It should run after your job finishes.
> You can create the flow using a simple bash script
>
> Daniel
>
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com>
> wrote:
>
> But can I use discp inside my job, or I need to program something that
> executes distcp after executing my job?
>
>
> On 02-02-2015 10:20, Daniel Haviv wrote:
>
>  an use distcp
>
> Daniel
>
>  On 2 בפבר׳ 2015, at 11:12,
>
>
>

Re: Copy data between clusters during the job execution.

Posted by Artem Ervits <ar...@gmail.com>.
take a look at oozie, once first job completes you can distcp to another
server.

Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <da...@gmail.com> wrote:

> It should run after your job finishes.
> You can create the flow using a simple bash script
>
> Daniel
>
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com>
> wrote:
>
> But can I use discp inside my job, or I need to program something that
> executes distcp after executing my job?
>
>
> On 02-02-2015 10:20, Daniel Haviv wrote:
>
>  an use distcp
>
> Daniel
>
>  On 2 בפבר׳ 2015, at 11:12,
>
>
>

Re: Copy data between clusters during the job execution.

Posted by Daniel Haviv <da...@gmail.com>.
It should run after your job finishes.
You can create the flow using a simple bash script

Daniel

> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com> wrote:
> 
> But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
> 
> 
>> On 02-02-2015 10:20, Daniel Haviv wrote:
>> an use distcp
>> 
>> Daniel
>> 
>> On 2 בפבר׳ 2015, at 11:12,
> 

Re: Copy data between clusters during the job execution.

Posted by Daniel Haviv <da...@gmail.com>.
It should run after your job finishes.
You can create the flow using a simple bash script

Daniel

> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com> wrote:
> 
> But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
> 
> 
>> On 02-02-2015 10:20, Daniel Haviv wrote:
>> an use distcp
>> 
>> Daniel
>> 
>> On 2 בפבר׳ 2015, at 11:12,
> 

Re: Copy data between clusters during the job execution.

Posted by Daniel Haviv <da...@gmail.com>.
It should run after your job finishes.
You can create the flow using a simple bash script

Daniel

> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com> wrote:
> 
> But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
> 
> 
>> On 02-02-2015 10:20, Daniel Haviv wrote:
>> an use distcp
>> 
>> Daniel
>> 
>> On 2 בפבר׳ 2015, at 11:12,
> 

Re: Copy data between clusters during the job execution.

Posted by Daniel Haviv <da...@gmail.com>.
It should run after your job finishes.
You can create the flow using a simple bash script

Daniel

> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com> wrote:
> 
> But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
> 
> 
>> On 02-02-2015 10:20, Daniel Haviv wrote:
>> an use distcp
>> 
>> Daniel
>> 
>> On 2 בפבר׳ 2015, at 11:12,
> 

Re: Copy data between clusters during the job execution.

Posted by xeonmailinglist <xe...@gmail.com>.
But can I use discp inside my job, or I need to program something that 
executes distcp after executing my job?


On 02-02-2015 10:20, Daniel Haviv wrote:
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,


Re: Copy data between clusters during the job execution.

Posted by xeonmailinglist <xe...@gmail.com>.
But can I use discp inside my job, or I need to program something that 
executes distcp after executing my job?


On 02-02-2015 10:20, Daniel Haviv wrote:
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,


Re: Copy data between clusters during the job execution.

Posted by xeonmailinglist <xe...@gmail.com>.
But can I use discp inside my job, or I need to program something that 
executes distcp after executing my job?


On 02-02-2015 10:20, Daniel Haviv wrote:
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,


Re: Copy data between clusters during the job execution.

Posted by xeonmailinglist <xe...@gmail.com>.
But can I use discp inside my job, or I need to program something that 
executes distcp after executing my job?


On 02-02-2015 10:20, Daniel Haviv wrote:
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,


Re: Copy data between clusters during the job execution.

Posted by Daniel Haviv <da...@gmail.com>.
You can use distcp

Daniel

> On 2 בפבר׳ 2015, at 11:12, xeon Mailinglist <xe...@gmail.com> wrote:
> 
> Hi
> 
> I want to have a job that copies the map output, or the reduce output to another hdfs. Is is possible?
> 
> E.g., the job runs in cluster 1 and takes the input from this cluster.  Then, before the job finishes, it copies the map output or the reduce output to the hdfs in the cluster 2.
> 
> Thanks,

Re: Copy data between clusters during the job execution.

Posted by Daniel Haviv <da...@gmail.com>.
You can use distcp

Daniel

> On 2 בפבר׳ 2015, at 11:12, xeon Mailinglist <xe...@gmail.com> wrote:
> 
> Hi
> 
> I want to have a job that copies the map output, or the reduce output to another hdfs. Is is possible?
> 
> E.g., the job runs in cluster 1 and takes the input from this cluster.  Then, before the job finishes, it copies the map output or the reduce output to the hdfs in the cluster 2.
> 
> Thanks,

Re: Copy data between clusters during the job execution.

Posted by Daniel Haviv <da...@gmail.com>.
You can use distcp

Daniel

> On 2 בפבר׳ 2015, at 11:12, xeon Mailinglist <xe...@gmail.com> wrote:
> 
> Hi
> 
> I want to have a job that copies the map output, or the reduce output to another hdfs. Is is possible?
> 
> E.g., the job runs in cluster 1 and takes the input from this cluster.  Then, before the job finishes, it copies the map output or the reduce output to the hdfs in the cluster 2.
> 
> Thanks,

Re: Copy data between clusters during the job execution.

Posted by Daniel Haviv <da...@gmail.com>.
You can use distcp

Daniel

> On 2 בפבר׳ 2015, at 11:12, xeon Mailinglist <xe...@gmail.com> wrote:
> 
> Hi
> 
> I want to have a job that copies the map output, or the reduce output to another hdfs. Is is possible?
> 
> E.g., the job runs in cluster 1 and takes the input from this cluster.  Then, before the job finishes, it copies the map output or the reduce output to the hdfs in the cluster 2.
> 
> Thanks,