You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phantom <gh...@gmail.com> on 2007/06/02 22:15:09 UTC
Chaining Map/Reduce tasks
Hi
Is there a way to chain Map/Reduce tasks ? What I mean is I want the output
a MapReduce task to serve as input to another MapReduce task ? Could someone
please show me how I can acheive this ?
Thanks
Avinas
Re: Chaining Map/Reduce tasks
Posted by Dennis Kubes <ku...@apache.org>.
This is in fact the preferred way to do large jobs is to chain the
output of one task as the input for another. On your first job you
would have your output format like this:
JobConf job1 = new NutchJob(getConf());
job1.setOutputPath(new Path("your path"));
job1.setOutputFormat(SequenceFileOutputFormat.class);
In this example we are using the SequenceFileOutputFormat but it really
could be any type of format. Sequence and Map formats are most common.
Then the second job simply uses the input from the first.
JobConf job2 = new NutchJob(getConf());
job2.addInputPath(new Path("your path"));
job2.setInputFormat(SequenceFileInputFormat.class);
Hope this helps.
Dennis Kubes
Phantom wrote:
> Hi
>
> Is there a way to chain Map/Reduce tasks ? What I mean is I want the output
> a MapReduce task to serve as input to another MapReduce task ? Could
> someone
> please show me how I can acheive this ?
>
> Thanks
> Avinas
>
Re: Chaining Map/Reduce tasks
Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Jun 2, 2007, at 1:15 PM, Phantom wrote:
> Is there a way to chain Map/Reduce tasks ? What I mean is I want
> the output
> a MapReduce task to serve as input to another MapReduce task ?
> Could someone
> please show me how I can acheive this ?
There is an example of this in the Grep example, but it is the most
common idiom. The only serious problem with it is that the text input
formats are not symmetric, so you pretty much want to use the
sequence file input format.
-- Owen