You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "S. Zhou" <my...@yahoo.com> on 2013/10/25 00:28:27 UTC

Mapreduce outputs to a different cluster?

The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang

Re: Mapreduce outputs to a different cluster?

Posted by Shahab Yunus <sh...@gmail.com>.
As far as I know, you can use distcp to transfer the results of the job
form one cluster to another, once the job is done. You can write a simple
script to do that. Simple and tested. Some poiners below:
http://doc.mapr.com/display/MapR/hadoop+distcp
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-3/parallel-copying-with-distcp
http://hadoop.apache.org/docs/r1.2.1/distcp.html

You might be able to do this through the job as well byt changing the
output paths of the  generated files but I wouldn't suggest that there can
be latency and performance issues.

Maybe others have better idea....

Regards,
Shahab


On Thu, Oct 24, 2013 at 6:28 PM, S. Zhou <my...@yahoo.com> wrote:

> The scenario is: I run mapreduce job on cluster A (all source data is in
> cluster A) but I want the output of the job to cluster B. Is it possible?
> If yes, please let me know how to do it.
>
> Here are some notes of my mapreduce job:
> 1. the data source is an HBase table
> 2. It only has mapper no reducer.
>
> Thanks
> Senqiang
>
>

Re: Mapreduce outputs to a different cluster?

Posted by Shahab Yunus <sh...@gmail.com>.
You can specify the HDFS path as follows:
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
where Path object is of course the location of your output dir.

See this for details
http://www.rohitmenon.com/index.php/introducing-mapreduce-part-i/


Regards,
Shahab


On Thu, Oct 24, 2013 at 11:25 PM, S. Zhou <my...@yahoo.com> wrote:

> Thanks Shahab & Yong. If cluster B (in which I want to dump output) has
> url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what
> should I specify as the output path for MR job?
> Thanks
>
>
>   On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <
> java8964@hotmail.com> wrote:
>  Just specify the output location using the URI to another cluster. As
> long as the network is accessible, you should be fine.
>
> Yong
>
> ------------------------------
> Date: Thu, 24 Oct 2013 15:28:27 -0700
> From: myxjtu@yahoo.com
> Subject: Mapreduce outputs to a different cluster?
> To: user@hadoop.apache.org
>
> The scenario is: I run mapreduce job on cluster A (all source data is in
> cluster A) but I want the output of the job to cluster B. Is it possible?
> If yes, please let me know how to do it.
>
> Here are some notes of my mapreduce job:
> 1. the data source is an HBase table
> 2. It only has mapper no reducer.
>
> Thanks
> Senqiang
>
>
>
>

RE: Mapreduce outputs to a different cluster?

Posted by java8964 java8964 <ja...@hotmail.com>.
Just use "hdfs://machine.domain:8080/tmp/myfolder"
Yong

Date: Thu, 24 Oct 2013 20:25:35 -0700
From: myxjtu@yahoo.com
Subject: Re: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

Thanks Shahab & Yong. If cluster B (in which I want to dump output) has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? 
Thanks
 
 
     On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <ja...@hotmail.com> wrote:
    Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.YongDate: Thu, 24 Oct 2013 15:28:27 -0700From: myxjtu@yahoo.comSubject: Mapreduce outputs to a different cluster?To: user@hadoop.apache.orgThe scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer.ThanksSenqiang 		 	   		  

       		 	   		  

RE: Mapreduce outputs to a different cluster?

Posted by java8964 java8964 <ja...@hotmail.com>.
Just use "hdfs://machine.domain:8080/tmp/myfolder"
Yong

Date: Thu, 24 Oct 2013 20:25:35 -0700
From: myxjtu@yahoo.com
Subject: Re: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

Thanks Shahab & Yong. If cluster B (in which I want to dump output) has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? 
Thanks
 
 
     On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <ja...@hotmail.com> wrote:
    Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.YongDate: Thu, 24 Oct 2013 15:28:27 -0700From: myxjtu@yahoo.comSubject: Mapreduce outputs to a different cluster?To: user@hadoop.apache.orgThe scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer.ThanksSenqiang 		 	   		  

       		 	   		  

RE: Mapreduce outputs to a different cluster?

Posted by java8964 java8964 <ja...@hotmail.com>.
Just use "hdfs://machine.domain:8080/tmp/myfolder"
Yong

Date: Thu, 24 Oct 2013 20:25:35 -0700
From: myxjtu@yahoo.com
Subject: Re: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

Thanks Shahab & Yong. If cluster B (in which I want to dump output) has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? 
Thanks
 
 
     On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <ja...@hotmail.com> wrote:
    Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.YongDate: Thu, 24 Oct 2013 15:28:27 -0700From: myxjtu@yahoo.comSubject: Mapreduce outputs to a different cluster?To: user@hadoop.apache.orgThe scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer.ThanksSenqiang 		 	   		  

       		 	   		  

RE: Mapreduce outputs to a different cluster?

Posted by java8964 java8964 <ja...@hotmail.com>.
Just use "hdfs://machine.domain:8080/tmp/myfolder"
Yong

Date: Thu, 24 Oct 2013 20:25:35 -0700
From: myxjtu@yahoo.com
Subject: Re: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

Thanks Shahab & Yong. If cluster B (in which I want to dump output) has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? 
Thanks
 
 
     On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <ja...@hotmail.com> wrote:
    Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.YongDate: Thu, 24 Oct 2013 15:28:27 -0700From: myxjtu@yahoo.comSubject: Mapreduce outputs to a different cluster?To: user@hadoop.apache.orgThe scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer.ThanksSenqiang 		 	   		  

       		 	   		  

Re: Mapreduce outputs to a different cluster?

Posted by Shahab Yunus <sh...@gmail.com>.
You can specify the HDFS path as follows:
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
where Path object is of course the location of your output dir.

See this for details
http://www.rohitmenon.com/index.php/introducing-mapreduce-part-i/


Regards,
Shahab


On Thu, Oct 24, 2013 at 11:25 PM, S. Zhou <my...@yahoo.com> wrote:

> Thanks Shahab & Yong. If cluster B (in which I want to dump output) has
> url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what
> should I specify as the output path for MR job?
> Thanks
>
>
>   On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <
> java8964@hotmail.com> wrote:
>  Just specify the output location using the URI to another cluster. As
> long as the network is accessible, you should be fine.
>
> Yong
>
> ------------------------------
> Date: Thu, 24 Oct 2013 15:28:27 -0700
> From: myxjtu@yahoo.com
> Subject: Mapreduce outputs to a different cluster?
> To: user@hadoop.apache.org
>
> The scenario is: I run mapreduce job on cluster A (all source data is in
> cluster A) but I want the output of the job to cluster B. Is it possible?
> If yes, please let me know how to do it.
>
> Here are some notes of my mapreduce job:
> 1. the data source is an HBase table
> 2. It only has mapper no reducer.
>
> Thanks
> Senqiang
>
>
>
>

Re: Mapreduce outputs to a different cluster?

Posted by Shahab Yunus <sh...@gmail.com>.
You can specify the HDFS path as follows:
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
where Path object is of course the location of your output dir.

See this for details
http://www.rohitmenon.com/index.php/introducing-mapreduce-part-i/


Regards,
Shahab


On Thu, Oct 24, 2013 at 11:25 PM, S. Zhou <my...@yahoo.com> wrote:

> Thanks Shahab & Yong. If cluster B (in which I want to dump output) has
> url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what
> should I specify as the output path for MR job?
> Thanks
>
>
>   On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <
> java8964@hotmail.com> wrote:
>  Just specify the output location using the URI to another cluster. As
> long as the network is accessible, you should be fine.
>
> Yong
>
> ------------------------------
> Date: Thu, 24 Oct 2013 15:28:27 -0700
> From: myxjtu@yahoo.com
> Subject: Mapreduce outputs to a different cluster?
> To: user@hadoop.apache.org
>
> The scenario is: I run mapreduce job on cluster A (all source data is in
> cluster A) but I want the output of the job to cluster B. Is it possible?
> If yes, please let me know how to do it.
>
> Here are some notes of my mapreduce job:
> 1. the data source is an HBase table
> 2. It only has mapper no reducer.
>
> Thanks
> Senqiang
>
>
>
>

Re: Mapreduce outputs to a different cluster?

Posted by Shahab Yunus <sh...@gmail.com>.
You can specify the HDFS path as follows:
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
where Path object is of course the location of your output dir.

See this for details
http://www.rohitmenon.com/index.php/introducing-mapreduce-part-i/


Regards,
Shahab


On Thu, Oct 24, 2013 at 11:25 PM, S. Zhou <my...@yahoo.com> wrote:

> Thanks Shahab & Yong. If cluster B (in which I want to dump output) has
> url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what
> should I specify as the output path for MR job?
> Thanks
>
>
>   On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <
> java8964@hotmail.com> wrote:
>  Just specify the output location using the URI to another cluster. As
> long as the network is accessible, you should be fine.
>
> Yong
>
> ------------------------------
> Date: Thu, 24 Oct 2013 15:28:27 -0700
> From: myxjtu@yahoo.com
> Subject: Mapreduce outputs to a different cluster?
> To: user@hadoop.apache.org
>
> The scenario is: I run mapreduce job on cluster A (all source data is in
> cluster A) but I want the output of the job to cluster B. Is it possible?
> If yes, please let me know how to do it.
>
> Here are some notes of my mapreduce job:
> 1. the data source is an HBase table
> 2. It only has mapper no reducer.
>
> Thanks
> Senqiang
>
>
>
>

Re: Mapreduce outputs to a different cluster?

Posted by "S. Zhou" <my...@yahoo.com>.
Thanks Shahab & Yong. If cluster B (in which I want to dump output) has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? 

Thanks




On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <ja...@hotmail.com> wrote:
 
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.

Yong



________________________________
Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myxjtu@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org


The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang

Re: Mapreduce outputs to a different cluster?

Posted by "S. Zhou" <my...@yahoo.com>.
Thanks Shahab & Yong. If cluster B (in which I want to dump output) has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? 

Thanks




On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <ja...@hotmail.com> wrote:
 
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.

Yong



________________________________
Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myxjtu@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org


The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang

Re: Mapreduce outputs to a different cluster?

Posted by "S. Zhou" <my...@yahoo.com>.
Thanks Shahab & Yong. If cluster B (in which I want to dump output) has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? 

Thanks




On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <ja...@hotmail.com> wrote:
 
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.

Yong



________________________________
Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myxjtu@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org


The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang

Re: Mapreduce outputs to a different cluster?

Posted by "S. Zhou" <my...@yahoo.com>.
Thanks Shahab & Yong. If cluster B (in which I want to dump output) has url "hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I specify as the output path for MR job? 

Thanks




On Thursday, October 24, 2013 5:31 PM, java8964 java8964 <ja...@hotmail.com> wrote:
 
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.

Yong



________________________________
Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myxjtu@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org


The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang

RE: Mapreduce outputs to a different cluster?

Posted by java8964 java8964 <ja...@hotmail.com>.
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.
Yong

Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myxjtu@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.
Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer.
ThanksSenqiang
 		 	   		  

Re: Mapreduce outputs to a different cluster?

Posted by Shahab Yunus <sh...@gmail.com>.
As far as I know, you can use distcp to transfer the results of the job
form one cluster to another, once the job is done. You can write a simple
script to do that. Simple and tested. Some poiners below:
http://doc.mapr.com/display/MapR/hadoop+distcp
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-3/parallel-copying-with-distcp
http://hadoop.apache.org/docs/r1.2.1/distcp.html

You might be able to do this through the job as well byt changing the
output paths of the  generated files but I wouldn't suggest that there can
be latency and performance issues.

Maybe others have better idea....

Regards,
Shahab


On Thu, Oct 24, 2013 at 6:28 PM, S. Zhou <my...@yahoo.com> wrote:

> The scenario is: I run mapreduce job on cluster A (all source data is in
> cluster A) but I want the output of the job to cluster B. Is it possible?
> If yes, please let me know how to do it.
>
> Here are some notes of my mapreduce job:
> 1. the data source is an HBase table
> 2. It only has mapper no reducer.
>
> Thanks
> Senqiang
>
>

Re: Mapreduce outputs to a different cluster?

Posted by Shahab Yunus <sh...@gmail.com>.
As far as I know, you can use distcp to transfer the results of the job
form one cluster to another, once the job is done. You can write a simple
script to do that. Simple and tested. Some poiners below:
http://doc.mapr.com/display/MapR/hadoop+distcp
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-3/parallel-copying-with-distcp
http://hadoop.apache.org/docs/r1.2.1/distcp.html

You might be able to do this through the job as well byt changing the
output paths of the  generated files but I wouldn't suggest that there can
be latency and performance issues.

Maybe others have better idea....

Regards,
Shahab


On Thu, Oct 24, 2013 at 6:28 PM, S. Zhou <my...@yahoo.com> wrote:

> The scenario is: I run mapreduce job on cluster A (all source data is in
> cluster A) but I want the output of the job to cluster B. Is it possible?
> If yes, please let me know how to do it.
>
> Here are some notes of my mapreduce job:
> 1. the data source is an HBase table
> 2. It only has mapper no reducer.
>
> Thanks
> Senqiang
>
>

RE: Mapreduce outputs to a different cluster?

Posted by java8964 java8964 <ja...@hotmail.com>.
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.
Yong

Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myxjtu@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.
Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer.
ThanksSenqiang
 		 	   		  

RE: Mapreduce outputs to a different cluster?

Posted by java8964 java8964 <ja...@hotmail.com>.
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.
Yong

Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myxjtu@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.
Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer.
ThanksSenqiang
 		 	   		  

RE: Mapreduce outputs to a different cluster?

Posted by java8964 java8964 <ja...@hotmail.com>.
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine.
Yong

Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myxjtu@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it.
Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer.
ThanksSenqiang
 		 	   		  

Re: Mapreduce outputs to a different cluster?

Posted by Shahab Yunus <sh...@gmail.com>.
As far as I know, you can use distcp to transfer the results of the job
form one cluster to another, once the job is done. You can write a simple
script to do that. Simple and tested. Some poiners below:
http://doc.mapr.com/display/MapR/hadoop+distcp
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-3/parallel-copying-with-distcp
http://hadoop.apache.org/docs/r1.2.1/distcp.html

You might be able to do this through the job as well byt changing the
output paths of the  generated files but I wouldn't suggest that there can
be latency and performance issues.

Maybe others have better idea....

Regards,
Shahab


On Thu, Oct 24, 2013 at 6:28 PM, S. Zhou <my...@yahoo.com> wrote:

> The scenario is: I run mapreduce job on cluster A (all source data is in
> cluster A) but I want the output of the job to cluster B. Is it possible?
> If yes, please let me know how to do it.
>
> Here are some notes of my mapreduce job:
> 1. the data source is an HBase table
> 2. It only has mapper no reducer.
>
> Thanks
> Senqiang
>
>