You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by bharath v <bh...@gmail.com> on 2009/10/02 12:29:32 UTC
Cascading jobs in hadoop
Hi all,
I have a set of map red jobs which need to be cascaded ,i.e, output of MR
job1 is the input of MR job2. etc..
Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
I have seen "x.addDependingJob(y)" function in the yahoo's hadoop tutorial
but that is for the older versions..
What is the similar thing in 0.20.0 API?
Any help is appreciated ,
Thanks
bharath.v
ug3
IIIT Hyderabad!
Re: Cascading jobs in hadoop
Posted by Kevin Weil <ke...@gmail.com>.
Bharath,
The mapred package is largely deprecated, as hadoop is moving towards the
mapreduce package. Use mapreduce for any new jobs you write, because mapred
will go away in some future release. For now, both are there to give
developers time to rewrite existing older jobs.
Kevin
On Sat, Oct 3, 2009 at 10:29 AM, bharath vissapragada <
bharat_v@students.iiit.ac.in> wrote:
> Tom and Chris ,
>
> Thanks for your replies .. I have seen thr o.a.h.mapred.jobcontrol.Job
> and o.a.h.mapreduce.Job .. Only one of them has the above option of
> adding a dependent Jobs .. Can anyone tell me the difference between
> "mapred" and "mapreduce" packages ..
>
> Thanks in advance
>
> On 10/2/09, Chris K Wensel <ch...@wensel.net> wrote:
> > You might find the Cascading project quite useful in this regard.
> >
> > http://www.cascading.org/
> >
> > using MapReduceFlow and CascadeConnector classes, you can chain
> > arbitrary MR jobs together. Cascading will determine the dependencies,
> > if any, and run the jobs in topological order (independent jobs will
> > be submitted to run in parallel).
> >
> > you may also find writing your own MR jobs by hand tedious and
> > brittle. Cascading can help you there as well.
> >
> > cheers,
> > chris
> >
> > On Oct 2, 2009, at 3:29 AM, bharath v wrote:
> >
> >> Hi all,
> >>
> >> I have a set of map red jobs which need to be cascaded ,i.e, output
> >> of MR
> >> job1 is the input of MR job2. etc..
> >>
> >> Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
> >>
> >> I have seen "x.addDependingJob(y)" function in the yahoo's hadoop
> >> tutorial
> >> but that is for the older versions..
> >> What is the similar thing in 0.20.0 API?
> >>
> >> Any help is appreciated ,
> >>
> >> Thanks
> >> bharath.v
> >> ug3
> >> IIIT Hyderabad!
> >
> > --
> > Chris K Wensel
> > chris@concurrentinc.com
> > http://www.concurrentinc.com
> >
> >
>
Re: Cascading jobs in hadoop
Posted by bharath vissapragada <bh...@students.iiit.ac.in>.
Tom and Chris ,
Thanks for your replies .. I have seen thr o.a.h.mapred.jobcontrol.Job
and o.a.h.mapreduce.Job .. Only one of them has the above option of
adding a dependent Jobs .. Can anyone tell me the difference between
"mapred" and "mapreduce" packages ..
Thanks in advance
On 10/2/09, Chris K Wensel <ch...@wensel.net> wrote:
> You might find the Cascading project quite useful in this regard.
>
> http://www.cascading.org/
>
> using MapReduceFlow and CascadeConnector classes, you can chain
> arbitrary MR jobs together. Cascading will determine the dependencies,
> if any, and run the jobs in topological order (independent jobs will
> be submitted to run in parallel).
>
> you may also find writing your own MR jobs by hand tedious and
> brittle. Cascading can help you there as well.
>
> cheers,
> chris
>
> On Oct 2, 2009, at 3:29 AM, bharath v wrote:
>
>> Hi all,
>>
>> I have a set of map red jobs which need to be cascaded ,i.e, output
>> of MR
>> job1 is the input of MR job2. etc..
>>
>> Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
>>
>> I have seen "x.addDependingJob(y)" function in the yahoo's hadoop
>> tutorial
>> but that is for the older versions..
>> What is the similar thing in 0.20.0 API?
>>
>> Any help is appreciated ,
>>
>> Thanks
>> bharath.v
>> ug3
>> IIIT Hyderabad!
>
> --
> Chris K Wensel
> chris@concurrentinc.com
> http://www.concurrentinc.com
>
>
Re: Cascading jobs in hadoop
Posted by Chris K Wensel <ch...@wensel.net>.
You might find the Cascading project quite useful in this regard.
http://www.cascading.org/
using MapReduceFlow and CascadeConnector classes, you can chain
arbitrary MR jobs together. Cascading will determine the dependencies,
if any, and run the jobs in topological order (independent jobs will
be submitted to run in parallel).
you may also find writing your own MR jobs by hand tedious and
brittle. Cascading can help you there as well.
cheers,
chris
On Oct 2, 2009, at 3:29 AM, bharath v wrote:
> Hi all,
>
> I have a set of map red jobs which need to be cascaded ,i.e, output
> of MR
> job1 is the input of MR job2. etc..
>
> Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
>
> I have seen "x.addDependingJob(y)" function in the yahoo's hadoop
> tutorial
> but that is for the older versions..
> What is the similar thing in 0.20.0 API?
>
> Any help is appreciated ,
>
> Thanks
> bharath.v
> ug3
> IIIT Hyderabad!
--
Chris K Wensel
chris@concurrentinc.com
http://www.concurrentinc.com
Re: Cascading jobs in hadoop
Posted by Tom White <to...@cloudera.com>.
Have a look at the JobControl class - this allows you to set up chains
of job dependencies.
Tom
On Fri, Oct 2, 2009 at 11:29 AM, bharath v
<bh...@gmail.com> wrote:
> Hi all,
>
> I have a set of map red jobs which need to be cascaded ,i.e, output of MR
> job1 is the input of MR job2. etc..
>
> Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
>
> I have seen "x.addDependingJob(y)" function in the yahoo's hadoop tutorial
> but that is for the older versions..
> What is the similar thing in 0.20.0 API?
>
> Any help is appreciated ,
>
> Thanks
> bharath.v
> ug3
> IIIT Hyderabad!
>