You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Ross Nordeen <rj...@mtu.edu> on 2011/07/25 20:20:12 UTC

MR 0.20.2 job chaining


Hello all,

I am trying to write a MR program where the output from the mappers are dependent on the previous map processes.  I understand that a job scheduler exists to control such processes.  Would anyone be able to give some sample code of a working implementation of this in hadoop 0.20.2?  

--
Ross Nordeen
Computer Networking And Systems Administration
Michigan Technological University
http://www.linkedin.com/in/rjnordee


RE: MR 0.20.2 job chaining

Posted by MONTMORY Alain <al...@thalesgroup.com>.
Hello,

You can also use Cascading API (http://www.cascading.org/) which greatly simplify the Job chainning.

In Thales we try both MR native and Cacading approach and we obtain very good results (productivity and performance) using cascading...

regards

[@@THALES GROUP RESTRICTED@@]

-----Message d'origine-----
De : Harsh J [mailto:harsh@cloudera.com] 
Envoyé : lundi 25 juillet 2011 23:22
À : mapreduce-user@hadoop.apache.org; Ross
Objet : Re: MR 0.20.2 job chaining

What you may be looking for is a workflow system such as Oozie
(yahoo.github.com/oozie/) or Azkaban
(http://sna-projects.com/azkaban/).

If your needs are simple (2-3 jobs, not too many conditions, etc. per
workflow), you can checkout the JobControl API
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html)
Hadoop offers to let you add dependent jobs and create uncomplicated
dep-chains.

P.s. Know that usually phases such as M-M-M-M can simply be M. If you
want modularity in code to represent phases, checkout ChainMapper
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/ChainMapper.html).

On Mon, Jul 25, 2011 at 11:50 PM, Ross Nordeen <rj...@mtu.edu> wrote:
>
>
> Hello all,
>
> I am trying to write a MR program where the output from the mappers are dependent on the previous map processes.  I understand that a job scheduler exists to control such processes.  Would anyone be able to give some sample code of a working implementation of this in hadoop 0.20.2?
>
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee
>
>



-- 
Harsh J

Re: MR 0.20.2 job chaining

Posted by Harsh J <ha...@cloudera.com>.
What you may be looking for is a workflow system such as Oozie
(yahoo.github.com/oozie/) or Azkaban
(http://sna-projects.com/azkaban/).

If your needs are simple (2-3 jobs, not too many conditions, etc. per
workflow), you can checkout the JobControl API
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html)
Hadoop offers to let you add dependent jobs and create uncomplicated
dep-chains.

P.s. Know that usually phases such as M-M-M-M can simply be M. If you
want modularity in code to represent phases, checkout ChainMapper
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/ChainMapper.html).

On Mon, Jul 25, 2011 at 11:50 PM, Ross Nordeen <rj...@mtu.edu> wrote:
>
>
> Hello all,
>
> I am trying to write a MR program where the output from the mappers are dependent on the previous map processes.  I understand that a job scheduler exists to control such processes.  Would anyone be able to give some sample code of a working implementation of this in hadoop 0.20.2?
>
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee
>
>



-- 
Harsh J