You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Kevin Burton <bu...@spinn3r.com> on 2011/09/27 21:09:47 UTC
output from one map reduce job as the input to another map reduce job?
Is it possible to connect the output of one map reduce job so that it is the
input to another map reduce job.
Basically… then reduce() outputs a key, that will be passed to another map()
function without having to store intermediate data to the filesystem.
Kevin
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
Skype-in: *(415) 871-0687*
Re: output from one map reduce job as the input to another map reduce job?
Posted by Arun C Murthy <ac...@hortonworks.com>.
On Sep 27, 2011, at 12:09 PM, Kevin Burton wrote:
> Is it possible to connect the output of one map reduce job so that it is the input to another map reduce job.
>
> Basically… then reduce() outputs a key, that will be passed to another map() function without having to store intermediate data to the filesystem.
>
Currently there is no way to pipeline in such a manner - with hadoop-0.23 it's doable, but will take more effort.
Arun
Re: output from one map reduce job as the input to another map reduce job?
Posted by Niels Basjes <Ni...@basjes.nl>.
To me it sounds like the asker should checkout tools like storm and s4
instead of hadoop.
http://www.infoq.com/news/2011/09/twitter-storm-real-time-hadoop
--
Met vriendelijke groet,
Niels Basjes
Op 27 sep. 2011 22:38 schreef "Mike Spreitzer" <ms...@us.ibm.com> het
volgende:
> It looks to me like Oozie will not do what was asked. In
>
http://yahoo.github.com/oozie/releases/3.0.0/WorkflowFunctionalSpec.html#a0_Definitions
> I see:
>
> 3.2.2 Map-Reduce Action
> ...
> The workflow job will wait until the Hadoop map/reduce job completes
> before continuing to the next action in the workflow execution path.
>
> That implies to me that the output of one job is held in some intermediate
> storage (likely HDFS) for a while before being read by the consuming
> job(s).
>
> Regards,
> Mike Spreitzer
Re: output from one map reduce job as the input to another map reduce job?
Posted by Mike Spreitzer <ms...@us.ibm.com>.
It looks to me like Oozie will not do what was asked. In
http://yahoo.github.com/oozie/releases/3.0.0/WorkflowFunctionalSpec.html#a0_Definitions
I see:
3.2.2 Map-Reduce Action
...
The workflow job will wait until the Hadoop map/reduce job completes
before continuing to the next action in the workflow execution path.
That implies to me that the output of one job is held in some intermediate
storage (likely HDFS) for a while before being read by the consuming
job(s).
Regards,
Mike Spreitzer
Re: output from one map reduce job as the input to another map reduce job?
Posted by Marcos Luis Ortiz Valmaseda <ma...@googlemail.com>.
Are you consider for this to Oozie? It´s a workflow engine developed for the
Yahoo! engineers
Yahoo/oozie at GitHub
https://github.com/yahoo/oozie
Oozie at InfoQ
http://www.infoq.com/articles/introductionOozie
Oozie´s examples:
http://www.infoq.com/articles/oozieexample
http://yahoo.github.com/oozie/releases/2.3.0/DG_Examples.html
Oozie at Cloudera
https://ccp.cloudera.com/display/CDHDOC/Oozie+Installation
Regards
2011/9/27 Arko Provo Mukherjee <ar...@gmail.com>
> Hi,
>
> I am not sure how you can avoid the filesystem, however, I did it as
> follows:
>
> // For Job 1
> FileInputFormat.addInputPath(job1, new Path(args[0]));
> FileOutputFormat.setOutputPath(job1, new Path(args[1]));
>
> // For job 2
> FileInputFormat.addInputPath(job2, new Path(args[1]));
> FileOutputFormat.setOutputPath(job2, new Path(args[2]));
>
> Assuming
> args[0] --> Input to first mapper
> args[1] --> Output of first reducer / Input to second mapper
> args[2] --> Out of second reducer
>
> Hope this helps!
> Warm regards
> Arko
>
> On Tue, Sep 27, 2011 at 2:09 PM, Kevin Burton <bu...@spinn3r.com> wrote:
> > Is it possible to connect the output of one map reduce job so that it is
> the
> > input to another map reduce job.
> > Basically… then reduce() outputs a key, that will be passed to another
> map()
> > function without having to store intermediate data to the filesystem.
> > Kevin
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> >
> > Location: San Francisco, CA
> > Skype: burtonator
> >
> > Skype-in: (415) 871-0687
> >
>
--
Marcos Luis Ortíz Valmaseda
Linux Infrastructure Engineer
Linux User # 418229
http://marcosluis2186.posterous.com
http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186
Re: output from one map reduce job as the input to another map reduce job?
Posted by Arko Provo Mukherjee <ar...@gmail.com>.
Hi,
I am not sure how you can avoid the filesystem, however, I did it as follows:
// For Job 1
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path(args[1]));
// For job 2
FileInputFormat.addInputPath(job2, new Path(args[1]));
FileOutputFormat.setOutputPath(job2, new Path(args[2]));
Assuming
args[0] --> Input to first mapper
args[1] --> Output of first reducer / Input to second mapper
args[2] --> Out of second reducer
Hope this helps!
Warm regards
Arko
On Tue, Sep 27, 2011 at 2:09 PM, Kevin Burton <bu...@spinn3r.com> wrote:
> Is it possible to connect the output of one map reduce job so that it is the
> input to another map reduce job.
> Basically… then reduce() outputs a key, that will be passed to another map()
> function without having to store intermediate data to the filesystem.
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com
>
> Location: San Francisco, CA
> Skype: burtonator
>
> Skype-in: (415) 871-0687
>