You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Ross Nordeen <rj...@mtu.edu> on 2011/07/25 20:20:12 UTC
MR 0.20.2 job chaining
Hello all,
I am trying to write a MR program where the output from the mappers are dependent on the previous map processes. I understand that a job scheduler exists to control such processes. Would anyone be able to give some sample code of a working implementation of this in hadoop 0.20.2?
--
Ross Nordeen
Computer Networking And Systems Administration
Michigan Technological University
http://www.linkedin.com/in/rjnordee
RE: MR 0.20.2 job chaining
Posted by MONTMORY Alain <al...@thalesgroup.com>.
Hello,
You can also use Cascading API (http://www.cascading.org/) which greatly simplify the Job chainning.
In Thales we try both MR native and Cacading approach and we obtain very good results (productivity and performance) using cascading...
regards
[@@THALES GROUP RESTRICTED@@]
-----Message d'origine-----
De : Harsh J [mailto:harsh@cloudera.com]
Envoyé : lundi 25 juillet 2011 23:22
À : mapreduce-user@hadoop.apache.org; Ross
Objet : Re: MR 0.20.2 job chaining
What you may be looking for is a workflow system such as Oozie
(yahoo.github.com/oozie/) or Azkaban
(http://sna-projects.com/azkaban/).
If your needs are simple (2-3 jobs, not too many conditions, etc. per
workflow), you can checkout the JobControl API
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html)
Hadoop offers to let you add dependent jobs and create uncomplicated
dep-chains.
P.s. Know that usually phases such as M-M-M-M can simply be M. If you
want modularity in code to represent phases, checkout ChainMapper
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/ChainMapper.html).
On Mon, Jul 25, 2011 at 11:50 PM, Ross Nordeen <rj...@mtu.edu> wrote:
>
>
> Hello all,
>
> I am trying to write a MR program where the output from the mappers are dependent on the previous map processes. I understand that a job scheduler exists to control such processes. Would anyone be able to give some sample code of a working implementation of this in hadoop 0.20.2?
>
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee
>
>
--
Harsh J
Re: MR 0.20.2 job chaining
Posted by Harsh J <ha...@cloudera.com>.
What you may be looking for is a workflow system such as Oozie
(yahoo.github.com/oozie/) or Azkaban
(http://sna-projects.com/azkaban/).
If your needs are simple (2-3 jobs, not too many conditions, etc. per
workflow), you can checkout the JobControl API
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html)
Hadoop offers to let you add dependent jobs and create uncomplicated
dep-chains.
P.s. Know that usually phases such as M-M-M-M can simply be M. If you
want modularity in code to represent phases, checkout ChainMapper
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/ChainMapper.html).
On Mon, Jul 25, 2011 at 11:50 PM, Ross Nordeen <rj...@mtu.edu> wrote:
>
>
> Hello all,
>
> I am trying to write a MR program where the output from the mappers are dependent on the previous map processes. I understand that a job scheduler exists to control such processes. Would anyone be able to give some sample code of a working implementation of this in hadoop 0.20.2?
>
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee
>
>
--
Harsh J