You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by bharath v <bh...@gmail.com> on 2009/10/02 12:29:32 UTC

Cascading jobs in hadoop

Hi all,

I have a set of map red jobs which need to be cascaded ,i.e, output of MR
job1 is the input of MR job2. etc..

Can anyone point me to the corresponding classes in hadoop 0.20.0 API?

I have seen "x.addDependingJob(y)" function in the yahoo's hadoop tutorial
but that is for the older versions..
What is the similar thing in 0.20.0 API?

Any help is appreciated ,

Thanks
bharath.v
ug3
IIIT Hyderabad!

Re: Cascading jobs in hadoop

Posted by Kevin Weil <ke...@gmail.com>.
Bharath,
The mapred package is largely deprecated, as hadoop is moving towards the
mapreduce package.  Use mapreduce for any new jobs you write, because mapred
will go away in some future release.  For now, both are there to give
developers time to rewrite existing older jobs.

Kevin

On Sat, Oct 3, 2009 at 10:29 AM, bharath vissapragada <
bharat_v@students.iiit.ac.in> wrote:

> Tom and Chris ,
>
> Thanks for your replies .. I have seen thr o.a.h.mapred.jobcontrol.Job
> and o.a.h.mapreduce.Job .. Only one of them has the above option of
> adding a dependent Jobs .. Can anyone tell me the difference between
> "mapred" and "mapreduce" packages ..
>
> Thanks in advance
>
> On 10/2/09, Chris K Wensel <ch...@wensel.net> wrote:
> > You might find the Cascading project quite useful in this regard.
> >
> > http://www.cascading.org/
> >
> > using MapReduceFlow and CascadeConnector classes, you can chain
> > arbitrary MR jobs together. Cascading will determine the dependencies,
> > if any, and run the jobs in topological order (independent jobs will
> > be submitted to run in parallel).
> >
> > you may also find writing your own MR jobs by hand tedious and
> > brittle. Cascading can help you there as well.
> >
> > cheers,
> > chris
> >
> > On Oct 2, 2009, at 3:29 AM, bharath v wrote:
> >
> >> Hi all,
> >>
> >> I have a set of map red jobs which need to be cascaded ,i.e, output
> >> of MR
> >> job1 is the input of MR job2. etc..
> >>
> >> Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
> >>
> >> I have seen "x.addDependingJob(y)" function in the yahoo's hadoop
> >> tutorial
> >> but that is for the older versions..
> >> What is the similar thing in 0.20.0 API?
> >>
> >> Any help is appreciated ,
> >>
> >> Thanks
> >> bharath.v
> >> ug3
> >> IIIT Hyderabad!
> >
> > --
> > Chris K Wensel
> > chris@concurrentinc.com
> > http://www.concurrentinc.com
> >
> >
>

Re: Cascading jobs in hadoop

Posted by bharath vissapragada <bh...@students.iiit.ac.in>.
Tom and Chris ,

Thanks for your replies .. I have seen thr o.a.h.mapred.jobcontrol.Job
and o.a.h.mapreduce.Job .. Only one of them has the above option of
adding a dependent Jobs .. Can anyone tell me the difference between
"mapred" and "mapreduce" packages ..

Thanks in advance

On 10/2/09, Chris K Wensel <ch...@wensel.net> wrote:
> You might find the Cascading project quite useful in this regard.
>
> http://www.cascading.org/
>
> using MapReduceFlow and CascadeConnector classes, you can chain
> arbitrary MR jobs together. Cascading will determine the dependencies,
> if any, and run the jobs in topological order (independent jobs will
> be submitted to run in parallel).
>
> you may also find writing your own MR jobs by hand tedious and
> brittle. Cascading can help you there as well.
>
> cheers,
> chris
>
> On Oct 2, 2009, at 3:29 AM, bharath v wrote:
>
>> Hi all,
>>
>> I have a set of map red jobs which need to be cascaded ,i.e, output
>> of MR
>> job1 is the input of MR job2. etc..
>>
>> Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
>>
>> I have seen "x.addDependingJob(y)" function in the yahoo's hadoop
>> tutorial
>> but that is for the older versions..
>> What is the similar thing in 0.20.0 API?
>>
>> Any help is appreciated ,
>>
>> Thanks
>> bharath.v
>> ug3
>> IIIT Hyderabad!
>
> --
> Chris K Wensel
> chris@concurrentinc.com
> http://www.concurrentinc.com
>
>

Re: Cascading jobs in hadoop

Posted by Chris K Wensel <ch...@wensel.net>.
You might find the Cascading project quite useful in this regard.

http://www.cascading.org/

using MapReduceFlow and CascadeConnector classes, you can chain  
arbitrary MR jobs together. Cascading will determine the dependencies,  
if any, and run the jobs in topological order (independent jobs will  
be submitted to run in parallel).

you may also find writing your own MR jobs by hand tedious and  
brittle. Cascading can help you there as well.

cheers,
chris

On Oct 2, 2009, at 3:29 AM, bharath v wrote:

> Hi all,
>
> I have a set of map red jobs which need to be cascaded ,i.e, output  
> of MR
> job1 is the input of MR job2. etc..
>
> Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
>
> I have seen "x.addDependingJob(y)" function in the yahoo's hadoop  
> tutorial
> but that is for the older versions..
> What is the similar thing in 0.20.0 API?
>
> Any help is appreciated ,
>
> Thanks
> bharath.v
> ug3
> IIIT Hyderabad!

--
Chris K Wensel
chris@concurrentinc.com
http://www.concurrentinc.com


Re: Cascading jobs in hadoop

Posted by Tom White <to...@cloudera.com>.
Have a look at the JobControl class - this allows you to set up chains
of job dependencies.

Tom

On Fri, Oct 2, 2009 at 11:29 AM, bharath v
<bh...@gmail.com> wrote:
> Hi all,
>
> I have a set of map red jobs which need to be cascaded ,i.e, output of MR
> job1 is the input of MR job2. etc..
>
> Can anyone point me to the corresponding classes in hadoop 0.20.0 API?
>
> I have seen "x.addDependingJob(y)" function in the yahoo's hadoop tutorial
> but that is for the older versions..
> What is the similar thing in 0.20.0 API?
>
> Any help is appreciated ,
>
> Thanks
> bharath.v
> ug3
> IIIT Hyderabad!
>