You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phantom <gh...@gmail.com> on 2007/06/02 22:15:09 UTC

Chaining Map/Reduce tasks

Hi

Is there a way to chain Map/Reduce tasks ? What I mean is I want the output
a MapReduce task to serve as input to another MapReduce task ? Could someone
please show me how I can acheive this ?

Thanks
Avinas

Re: Chaining Map/Reduce tasks

Posted by Dennis Kubes <ku...@apache.org>.
This is in fact the preferred way to do large jobs is to chain the 
output of one task as the input for another.  On your first job you 
would have your output format like this:

JobConf job1 = new NutchJob(getConf());
job1.setOutputPath(new Path("your path"));
job1.setOutputFormat(SequenceFileOutputFormat.class);

In this example we are using the SequenceFileOutputFormat but it really 
could be any type of format.  Sequence and Map formats are most common. 
  Then the second job simply uses the input from the first.

JobConf job2 = new NutchJob(getConf());
job2.addInputPath(new Path("your path"));
job2.setInputFormat(SequenceFileInputFormat.class);

Hope this helps.

Dennis Kubes

Phantom wrote:
> Hi
> 
> Is there a way to chain Map/Reduce tasks ? What I mean is I want the output
> a MapReduce task to serve as input to another MapReduce task ? Could 
> someone
> please show me how I can acheive this ?
> 
> Thanks
> Avinas
> 

Re: Chaining Map/Reduce tasks

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Jun 2, 2007, at 1:15 PM, Phantom wrote:

> Is there a way to chain Map/Reduce tasks ? What I mean is I want  
> the output
> a MapReduce task to serve as input to another MapReduce task ?  
> Could someone
> please show me how I can acheive this ?

There is an example of this in the Grep example, but it is the most  
common idiom. The only serious problem with it is that the text input  
formats are not symmetric, so you pretty much want to use the  
sequence file input format.

-- Owen