You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by xeon Mailinglist <xe...@gmail.com> on 2015/02/02 10:12:20 UTC
Copy data between clusters during the job execution.
Hi
I want to have a job that copies the map output, or the reduce output to
another hdfs. Is is possible?
E.g., the job runs in cluster 1 and takes the input from this cluster.
Then, before the job finishes, it copies the map output or the reduce
output to the hdfs in the cluster 2.
Thanks,
RE: Copy data between clusters during the job execution.
Posted by ha...@visolve.com.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is :
Distcp (solution to your problem)
hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1
It is also wise to use latest tool:
distcp2
hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1
Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node.
Optional:
If you need, you can provide multiple directories using “\” option the command
Thanks and Regards,
S.RagavendraGanesh
Hadoop Support Team
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com
From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.
take a look at oozie, once first job completes you can distcp to another server.
Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <ma...@gmail.com> > wrote:
It should run after your job finishes.
You can create the flow using a simple bash script
Daniel
On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <ma...@gmail.com> > wrote:
But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
an use distcp
Daniel
On 2 בפבר׳ 2015, at 11:12,
RE: Copy data between clusters during the job execution.
Posted by ha...@visolve.com.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is :
Distcp (solution to your problem)
hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1
It is also wise to use latest tool:
distcp2
hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1
Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node.
Optional:
If you need, you can provide multiple directories using “\” option the command
Thanks and Regards,
S.RagavendraGanesh
Hadoop Support Team
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com
From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.
take a look at oozie, once first job completes you can distcp to another server.
Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <ma...@gmail.com> > wrote:
It should run after your job finishes.
You can create the flow using a simple bash script
Daniel
On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <ma...@gmail.com> > wrote:
But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
an use distcp
Daniel
On 2 בפבר׳ 2015, at 11:12,
RE: Copy data between clusters during the job execution.
Posted by ha...@visolve.com.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is :
Distcp (solution to your problem)
hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1
It is also wise to use latest tool:
distcp2
hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1
Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node.
Optional:
If you need, you can provide multiple directories using “\” option the command
Thanks and Regards,
S.RagavendraGanesh
Hadoop Support Team
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com
From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.
take a look at oozie, once first job completes you can distcp to another server.
Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <ma...@gmail.com> > wrote:
It should run after your job finishes.
You can create the flow using a simple bash script
Daniel
On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <ma...@gmail.com> > wrote:
But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
an use distcp
Daniel
On 2 בפבר׳ 2015, at 11:12,
RE: Copy data between clusters during the job execution.
Posted by ha...@visolve.com.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is :
Distcp (solution to your problem)
hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1
It is also wise to use latest tool:
distcp2
hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1
Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node.
Optional:
If you need, you can provide multiple directories using “\” option the command
Thanks and Regards,
S.RagavendraGanesh
Hadoop Support Team
ViSolve Inc.| <http://www.visolve.com/> www.visolve.com
From: dbist13@gmail.com [mailto:dbist13@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.
take a look at oozie, once first job completes you can distcp to another server.
Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <danielrulez@gmail.com <ma...@gmail.com> > wrote:
It should run after your job finishes.
You can create the flow using a simple bash script
Daniel
On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xeonmailinglist@gmail.com <ma...@gmail.com> > wrote:
But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
an use distcp
Daniel
On 2 בפבר׳ 2015, at 11:12,
Re: Copy data between clusters during the job execution.
Posted by Artem Ervits <ar...@gmail.com>.
take a look at oozie, once first job completes you can distcp to another
server.
Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <da...@gmail.com> wrote:
> It should run after your job finishes.
> You can create the flow using a simple bash script
>
> Daniel
>
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com>
> wrote:
>
> But can I use discp inside my job, or I need to program something that
> executes distcp after executing my job?
>
>
> On 02-02-2015 10:20, Daniel Haviv wrote:
>
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,
>
>
>
Re: Copy data between clusters during the job execution.
Posted by Artem Ervits <ar...@gmail.com>.
take a look at oozie, once first job completes you can distcp to another
server.
Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <da...@gmail.com> wrote:
> It should run after your job finishes.
> You can create the flow using a simple bash script
>
> Daniel
>
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com>
> wrote:
>
> But can I use discp inside my job, or I need to program something that
> executes distcp after executing my job?
>
>
> On 02-02-2015 10:20, Daniel Haviv wrote:
>
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,
>
>
>
Re: Copy data between clusters during the job execution.
Posted by Artem Ervits <ar...@gmail.com>.
take a look at oozie, once first job completes you can distcp to another
server.
Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <da...@gmail.com> wrote:
> It should run after your job finishes.
> You can create the flow using a simple bash script
>
> Daniel
>
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com>
> wrote:
>
> But can I use discp inside my job, or I need to program something that
> executes distcp after executing my job?
>
>
> On 02-02-2015 10:20, Daniel Haviv wrote:
>
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,
>
>
>
Re: Copy data between clusters during the job execution.
Posted by Artem Ervits <ar...@gmail.com>.
take a look at oozie, once first job completes you can distcp to another
server.
Artem Ervits
On Feb 2, 2015 5:46 AM, "Daniel Haviv" <da...@gmail.com> wrote:
> It should run after your job finishes.
> You can create the flow using a simple bash script
>
> Daniel
>
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com>
> wrote:
>
> But can I use discp inside my job, or I need to program something that
> executes distcp after executing my job?
>
>
> On 02-02-2015 10:20, Daniel Haviv wrote:
>
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,
>
>
>
Re: Copy data between clusters during the job execution.
Posted by Daniel Haviv <da...@gmail.com>.
It should run after your job finishes.
You can create the flow using a simple bash script
Daniel
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com> wrote:
>
> But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
>
>
>> On 02-02-2015 10:20, Daniel Haviv wrote:
>> an use distcp
>>
>> Daniel
>>
>> On 2 בפבר׳ 2015, at 11:12,
>
Re: Copy data between clusters during the job execution.
Posted by Daniel Haviv <da...@gmail.com>.
It should run after your job finishes.
You can create the flow using a simple bash script
Daniel
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com> wrote:
>
> But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
>
>
>> On 02-02-2015 10:20, Daniel Haviv wrote:
>> an use distcp
>>
>> Daniel
>>
>> On 2 בפבר׳ 2015, at 11:12,
>
Re: Copy data between clusters during the job execution.
Posted by Daniel Haviv <da...@gmail.com>.
It should run after your job finishes.
You can create the flow using a simple bash script
Daniel
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com> wrote:
>
> But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
>
>
>> On 02-02-2015 10:20, Daniel Haviv wrote:
>> an use distcp
>>
>> Daniel
>>
>> On 2 בפבר׳ 2015, at 11:12,
>
Re: Copy data between clusters during the job execution.
Posted by Daniel Haviv <da...@gmail.com>.
It should run after your job finishes.
You can create the flow using a simple bash script
Daniel
> On 2 בפבר׳ 2015, at 12:31, xeonmailinglist <xe...@gmail.com> wrote:
>
> But can I use discp inside my job, or I need to program something that executes distcp after executing my job?
>
>
>> On 02-02-2015 10:20, Daniel Haviv wrote:
>> an use distcp
>>
>> Daniel
>>
>> On 2 בפבר׳ 2015, at 11:12,
>
Re: Copy data between clusters during the job execution.
Posted by xeonmailinglist <xe...@gmail.com>.
But can I use discp inside my job, or I need to program something that
executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,
Re: Copy data between clusters during the job execution.
Posted by xeonmailinglist <xe...@gmail.com>.
But can I use discp inside my job, or I need to program something that
executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,
Re: Copy data between clusters during the job execution.
Posted by xeonmailinglist <xe...@gmail.com>.
But can I use discp inside my job, or I need to program something that
executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,
Re: Copy data between clusters during the job execution.
Posted by xeonmailinglist <xe...@gmail.com>.
But can I use discp inside my job, or I need to program something that
executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
> an use distcp
>
> Daniel
>
> On 2 בפבר׳ 2015, at 11:12,
Re: Copy data between clusters during the job execution.
Posted by Daniel Haviv <da...@gmail.com>.
You can use distcp
Daniel
> On 2 בפבר׳ 2015, at 11:12, xeon Mailinglist <xe...@gmail.com> wrote:
>
> Hi
>
> I want to have a job that copies the map output, or the reduce output to another hdfs. Is is possible?
>
> E.g., the job runs in cluster 1 and takes the input from this cluster. Then, before the job finishes, it copies the map output or the reduce output to the hdfs in the cluster 2.
>
> Thanks,
Re: Copy data between clusters during the job execution.
Posted by Daniel Haviv <da...@gmail.com>.
You can use distcp
Daniel
> On 2 בפבר׳ 2015, at 11:12, xeon Mailinglist <xe...@gmail.com> wrote:
>
> Hi
>
> I want to have a job that copies the map output, or the reduce output to another hdfs. Is is possible?
>
> E.g., the job runs in cluster 1 and takes the input from this cluster. Then, before the job finishes, it copies the map output or the reduce output to the hdfs in the cluster 2.
>
> Thanks,
Re: Copy data between clusters during the job execution.
Posted by Daniel Haviv <da...@gmail.com>.
You can use distcp
Daniel
> On 2 בפבר׳ 2015, at 11:12, xeon Mailinglist <xe...@gmail.com> wrote:
>
> Hi
>
> I want to have a job that copies the map output, or the reduce output to another hdfs. Is is possible?
>
> E.g., the job runs in cluster 1 and takes the input from this cluster. Then, before the job finishes, it copies the map output or the reduce output to the hdfs in the cluster 2.
>
> Thanks,
Re: Copy data between clusters during the job execution.
Posted by Daniel Haviv <da...@gmail.com>.
You can use distcp
Daniel
> On 2 בפבר׳ 2015, at 11:12, xeon Mailinglist <xe...@gmail.com> wrote:
>
> Hi
>
> I want to have a job that copies the map output, or the reduce output to another hdfs. Is is possible?
>
> E.g., the job runs in cluster 1 and takes the input from this cluster. Then, before the job finishes, it copies the map output or the reduce output to the hdfs in the cluster 2.
>
> Thanks,