You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Sarthak Dudhara <sa...@gmail.com> on 2011/01/02 00:18:45 UTC

0.20.2 : Running Chained jobs using JobControl

Hi,

I am trying to run chained jobs using Hadoop 0.20.2. I see that the class
JobControl is provided for the same purpose in 0.20.2. However, I can only
add the deprecated class org.apache.hadoop.mapred.jobcontrol.Job.
For using the JobControl class I need to use the older deprecated Job class
and the other deprecated classes from the mapred package e.g. JobConf, etc.
to be able to use JobControl. I am starting out developing infrastructure
for us to create jobs etc and dont want to use deprecated classes/packages
preferably.

In Hadoop 0.21.0, I have seen there is another class ControlledJob and
JobControl in the mapreduce.lib.jobcontrol package that takes care of this.
However 0.21.0 is not production ready as mentioned on the Hadoop site.

What is the suggested course of action in this case? I see that using
deprecated classes is the only option I see. Actualy, using deprecated
classes will cause quite a few cascading changes, so I am trying to not do
that if possible.

Thanks

Sarthak Dudhara

Re: 0.20.2 : Running Chained jobs using JobControl

Posted by Sarthak Dudhara <sa...@gmail.com>.
Hi Harsh,

Yes exactly, simple chaining could work in this case. However, we envision
complex workflows in the future and so we would checkout other tooling
available for that.

Sarthak Dudhara



On Sun, Jan 2, 2011 at 3:06 AM, Harsh J <qw...@gmail.com> wrote:

> Although for complex workflows one should checkout Oozie or Azkaban.
>
> On Sun, Jan 2, 2011 at 1:55 PM, Hari Sreekumar <hs...@clickable.com>
> wrote:
> > Can't we run chained jobs like this?
> > bool j1 = job1.waitForCompletion(..) ;
> > if(b1) job2.waitForCompletion(..) ;
> >
> > and setting up the jobs such that job1's output dir is job2's input dir?
> > Thanks,
> > Hari
>
> Yes, this could work for simple success/failure based chaining
> (although it makes the driver code look a tad messy?).
>
> This is what JobControl is aiming to provide from within Hadoop
> libraries itself. Plus the ability to have more control on the
> dependent, waiting jobs.
>
> --
> Harsh J
> www.harshj.com
>

exception related to logging (0.21.0)

Posted by MONTMORY Alain <al...@thalesgroup.com>.
Hi everybody,

When running (0.21.0) map/reduce jobs i have got this exception : 

java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:69) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:222) at org.apache.hadoop.mapred.Child$4.run(Child.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) at org.apache.hadoop.mapred.Child.main(Child.java:211)

in the mailling list, I have seen another user which have the same probleme (2010-06-15) but i don't see if a solution has been found....

Thanks you, any ideas apreciated

[@@THALES GROUP RESTRICTED@@]

-----Message d'origine-----
De : Harsh J [mailto:qwertymaniac@gmail.com] 
Envoyé : dimanche 2 janvier 2011 12:06
À : mapreduce-user@hadoop.apache.org
Objet : Re: 0.20.2 : Running Chained jobs using JobControl

Although for complex workflows one should checkout Oozie or Azkaban.

On Sun, Jan 2, 2011 at 1:55 PM, Hari Sreekumar <hs...@clickable.com> wrote:
> Can't we run chained jobs like this?
> bool j1 = job1.waitForCompletion(..) ;
> if(b1) job2.waitForCompletion(..) ;
>
> and setting up the jobs such that job1's output dir is job2's input dir?
> Thanks,
> Hari

Yes, this could work for simple success/failure based chaining
(although it makes the driver code look a tad messy?).

This is what JobControl is aiming to provide from within Hadoop
libraries itself. Plus the ability to have more control on the
dependent, waiting jobs.

-- 
Harsh J
www.harshj.com

Re: 0.20.2 : Running Chained jobs using JobControl

Posted by Harsh J <qw...@gmail.com>.
Although for complex workflows one should checkout Oozie or Azkaban.

On Sun, Jan 2, 2011 at 1:55 PM, Hari Sreekumar <hs...@clickable.com> wrote:
> Can't we run chained jobs like this?
> bool j1 = job1.waitForCompletion(..) ;
> if(b1) job2.waitForCompletion(..) ;
>
> and setting up the jobs such that job1's output dir is job2's input dir?
> Thanks,
> Hari

Yes, this could work for simple success/failure based chaining
(although it makes the driver code look a tad messy?).

This is what JobControl is aiming to provide from within Hadoop
libraries itself. Plus the ability to have more control on the
dependent, waiting jobs.

-- 
Harsh J
www.harshj.com

Re: 0.20.2 : Running Chained jobs using JobControl

Posted by Hari Sreekumar <hs...@clickable.com>.
Can't we run chained jobs like this?

bool j1 = job1.waitForCompletion(..) ;
if(b1) job2.waitForCompletion(..) ;

and setting up the jobs such that job1's output dir is job2's input dir?

Thanks,
Hari
On Sun, Jan 2, 2011 at 5:09 AM, Chris K Wensel <ch...@wensel.net> wrote:

> Probably safe to use the deprecated apis considering
> https://issues.apache.org/jira/browse/MAPREDUCE-1734
>
> ckw
>
> On Jan 1, 2011, at 3:18 PM, Sarthak Dudhara wrote:
>
> Hi,
>
> I am trying to run chained jobs using Hadoop 0.20.2. I see that the class
> JobControl is provided for the same purpose in 0.20.2. However, I can only
> add the deprecated class org.apache.hadoop.mapred.jobcontrol.Job.
> For using the JobControl class I need to use the older deprecated Job class
> and the other deprecated classes from the mapred package e.g. JobConf, etc.
> to be able to use JobControl. I am starting out developing infrastructure
> for us to create jobs etc and dont want to use deprecated classes/packages
> preferably.
>
> In Hadoop 0.21.0, I have seen there is another class ControlledJob and
> JobControl in the mapreduce.lib.jobcontrol package that takes care of this.
> However 0.21.0 is not production ready as mentioned on the Hadoop site.
>
> What is the suggested course of action in this case? I see that using
> deprecated classes is the only option I see. Actualy, using deprecated
> classes will cause quite a few cascading changes, so I am trying to not do
> that if possible.
>
> Thanks
>
> Sarthak Dudhara
>
>
> --
> Chris K Wensel
> chris@concurrentinc.com
> http://www.concurrentinc.com
>
> -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
>
>

Re: 0.20.2 : Running Chained jobs using JobControl

Posted by Chris K Wensel <ch...@wensel.net>.
Probably safe to use the deprecated apis considering
https://issues.apache.org/jira/browse/MAPREDUCE-1734

ckw

On Jan 1, 2011, at 3:18 PM, Sarthak Dudhara wrote:

> Hi, 
> 
> I am trying to run chained jobs using Hadoop 0.20.2. I see that the class JobControl is provided for the same purpose in 0.20.2. However, I can only add the deprecated class
> org.apache.hadoop.mapred.jobcontrol.Job. 
> 
> For using the JobControl class I need to use the older deprecated Job class and the other deprecated classes from the mapred package e.g. JobConf, etc. to be able to use JobControl. I am starting out developing infrastructure for us to create jobs etc and dont want to use deprecated classes/packages preferably. 
> 
> In Hadoop 0.21.0, I have seen there is another class ControlledJob and JobControl in the mapreduce.lib.jobcontrol package that takes care of this. However 0.21.0 is not production ready as mentioned on the Hadoop site. 
> 
> What is the suggested course of action in this case? I see that using deprecated classes is the only option I see. Actualy, using deprecated classes will cause quite a few cascading changes, so I am trying to not do that if possible. 
> 
> Thanks 
> 
> Sarthak Dudhara
> 

--
Chris K Wensel
chris@concurrentinc.com
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support, and licensing for Cascading