You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by xeonmailinglist <xe...@gmail.com> on 2015/10/09 11:46:49 UTC

Job that just runs the reduce tasks

Hi,

If we run a job without reduce tasks, the map output is going to be 
saved into HDFS. Now, I would like to launch another job that reads the 
map output and compute the reduce phase. Is it possible to execute a job 
that reads the map output from HDFS and just runs the reduce phase?

Thanks,


RE: Job that just runs the reduce tasks

Posted by Daniel Schulz <da...@hotmail.com>.
Hi,
Yes: this is possible. Just configure the 1st MR job's output path as the 2nd ones inputs. There will be identity mappers running -- compared to no mappers -- but they come with Hadoop. They are just a technical neccessity.
To avoid this overhead, Tez, Spark, Flink and other execution engines were build to write a DAG and run your algorithms on them.
Kind regards, Daniel.

> To: user@hadoop.apache.org
> From: xeonmailinglist@gmail.com
> Subject: Job that just runs the reduce tasks
> Date: Fri, 9 Oct 2015 10:46:49 +0100
> 
> Hi,
> 
> If we run a job without reduce tasks, the map output is going to be 
> saved into HDFS. Now, I would like to launch another job that reads the 
> map output and compute the reduce phase. Is it possible to execute a job 
> that reads the map output from HDFS and just runs the reduce phase?
> 
> Thanks,
> 
 		 	   		  

RE: Job that just runs the reduce tasks

Posted by Daniel Schulz <da...@hotmail.com>.
Hi,
Yes: this is possible. Just configure the 1st MR job's output path as the 2nd ones inputs. There will be identity mappers running -- compared to no mappers -- but they come with Hadoop. They are just a technical neccessity.
To avoid this overhead, Tez, Spark, Flink and other execution engines were build to write a DAG and run your algorithms on them.
Kind regards, Daniel.

> To: user@hadoop.apache.org
> From: xeonmailinglist@gmail.com
> Subject: Job that just runs the reduce tasks
> Date: Fri, 9 Oct 2015 10:46:49 +0100
> 
> Hi,
> 
> If we run a job without reduce tasks, the map output is going to be 
> saved into HDFS. Now, I would like to launch another job that reads the 
> map output and compute the reduce phase. Is it possible to execute a job 
> that reads the map output from HDFS and just runs the reduce phase?
> 
> Thanks,
> 
 		 	   		  

RE: Job that just runs the reduce tasks

Posted by Daniel Schulz <da...@hotmail.com>.
Hi,
Yes: this is possible. Just configure the 1st MR job's output path as the 2nd ones inputs. There will be identity mappers running -- compared to no mappers -- but they come with Hadoop. They are just a technical neccessity.
To avoid this overhead, Tez, Spark, Flink and other execution engines were build to write a DAG and run your algorithms on them.
Kind regards, Daniel.

> To: user@hadoop.apache.org
> From: xeonmailinglist@gmail.com
> Subject: Job that just runs the reduce tasks
> Date: Fri, 9 Oct 2015 10:46:49 +0100
> 
> Hi,
> 
> If we run a job without reduce tasks, the map output is going to be 
> saved into HDFS. Now, I would like to launch another job that reads the 
> map output and compute the reduce phase. Is it possible to execute a job 
> that reads the map output from HDFS and just runs the reduce phase?
> 
> Thanks,
> 
 		 	   		  

RE: Job that just runs the reduce tasks

Posted by Daniel Schulz <da...@hotmail.com>.
Hi,
Yes: this is possible. Just configure the 1st MR job's output path as the 2nd ones inputs. There will be identity mappers running -- compared to no mappers -- but they come with Hadoop. They are just a technical neccessity.
To avoid this overhead, Tez, Spark, Flink and other execution engines were build to write a DAG and run your algorithms on them.
Kind regards, Daniel.

> To: user@hadoop.apache.org
> From: xeonmailinglist@gmail.com
> Subject: Job that just runs the reduce tasks
> Date: Fri, 9 Oct 2015 10:46:49 +0100
> 
> Hi,
> 
> If we run a job without reduce tasks, the map output is going to be 
> saved into HDFS. Now, I would like to launch another job that reads the 
> map output and compute the reduce phase. Is it possible to execute a job 
> that reads the map output from HDFS and just runs the reduce phase?
> 
> Thanks,
>