You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by web service <wb...@gmail.com> on 2010/11/12 03:17:30 UTC
running hadoop jobs from within a program
Hi,
Currently I run my sample hadoop job from a bash script using the
following command ...
[code]
tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-$i/
/user/vadmin/output/output-$i/
$tmp
[/code]
However, I would want to write a timer that would do some cleanup after the
jobs are complete and restart the jobs after x hours. What I am looking for
is
the ability to invoke job from within a program and not the jar command
thing.
-Mac
Re: running hadoop jobs from within a program
Posted by web service <wb...@gmail.com>.
Thanks, had figured it out. It is fun to figure out how things work :)
On Sun, Nov 14, 2010 at 4:22 AM, Harsh J <qw...@gmail.com> wrote:
> Hello,
>
> On Fri, Nov 12, 2010 at 10:25 PM, web service <wb...@gmail.com> wrote:
> > Thanks, but submitting three different jobs say using
> >
> > JobClient.submitjob(jobconf1);
> > JobClient.submitjob(jobconf2);
> > JobClient.submitjob(jobconf3)
> >
> > different from running -
> > tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-1/
> > /user/vadmin/output/output-1/
> > tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-2/
> > /user/vadmin/output/output-2/
> > tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-3/
> > /user/vadmin/output/output-3/
>
> It isn't different. In both cases a new JobID is assigned for each job
> created and its specific configuration is associated to it upon
> submission.
>
> >
> > I guess every job can have specific jvm options. and I hope that every
> > submitted job runs in a separate jvm, No ?
>
> Yes, each Task (Map or Reduce, under the Job) runs in a separate JVM
> (although JVMs can be reused using a tweak).
>
> --
> Harsh J
> www.harshj.com
>
Re: running hadoop jobs from within a program
Posted by Harsh J <qw...@gmail.com>.
Hello,
On Fri, Nov 12, 2010 at 10:25 PM, web service <wb...@gmail.com> wrote:
> Thanks, but submitting three different jobs say using
>
> JobClient.submitjob(jobconf1);
> JobClient.submitjob(jobconf2);
> JobClient.submitjob(jobconf3)
>
> different from running -
> tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-1/
> /user/vadmin/output/output-1/
> tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-2/
> /user/vadmin/output/output-2/
> tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-3/
> /user/vadmin/output/output-3/
It isn't different. In both cases a new JobID is assigned for each job
created and its specific configuration is associated to it upon
submission.
>
> I guess every job can have specific jvm options. and I hope that every
> submitted job runs in a separate jvm, No ?
Yes, each Task (Map or Reduce, under the Job) runs in a separate JVM
(although JVMs can be reused using a tweak).
--
Harsh J
www.harshj.com
Re: running hadoop jobs from within a program
Posted by web service <wb...@gmail.com>.
Thanks, but submitting three different jobs say using
JobClient.submitjob(jobconf1);
JobClient.submitjob(jobconf2);
JobClient.submitjob(jobconf3)
different from running -
tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-1/
/user/vadmin/output/output-1/
tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-2/
/user/vadmin/output/output-2/
tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-3/
/user/vadmin/output/output-3/
I guess every job can have specific jvm options. and I hope that every
submitted job runs in a separate jvm, No ?
On Fri, Nov 12, 2010 at 12:55 AM, daniel sikar <ds...@gmail.com> wrote:
> I suggest you write a loop in your bash script, grepping for finished,
> then take it from there.
> Also, you can submit the same job as many times as you like.
>
> On 12 November 2010 02:17, web service <wb...@gmail.com> wrote:
> > Hi,
> > Currently I run my sample hadoop job from a bash script using the
> > following command ...
> >
> > [code]
> > tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-$i/
> > /user/vadmin/output/output-$i/
> > $tmp
> > [/code]
> >
> > However, I would want to write a timer that would do some cleanup after
> the
> > jobs are complete and restart the jobs after x hours. What I am looking
> for
> > is
> > the ability to invoke job from within a program and not the jar command
> > thing.
> >
> > -Mac
> >
>
Re: running hadoop jobs from within a program
Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Mac,
You should a look at Oozie, it will allow you to do what you describe.
You can either build Oozie from https://github.com/yahoo/oozie or
download CDH3b3 distribution from http://www.cloudera.com/downloads/
(Oozie is preconfigured to work with CHD3b3 Hadoop).
Hope this helps.
Alejandro
On Fri, Nov 12, 2010 at 12:55 AM, daniel sikar <ds...@gmail.com> wrote:
> I suggest you write a loop in your bash script, grepping for finished,
> then take it from there.
> Also, you can submit the same job as many times as you like.
>
> On 12 November 2010 02:17, web service <wb...@gmail.com> wrote:
>> Hi,
>> Currently I run my sample hadoop job from a bash script using the
>> following command ...
>>
>> [code]
>> tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-$i/
>> /user/vadmin/output/output-$i/
>> $tmp
>> [/code]
>>
>> However, I would want to write a timer that would do some cleanup after the
>> jobs are complete and restart the jobs after x hours. What I am looking for
>> is
>> the ability to invoke job from within a program and not the jar command
>> thing.
>>
>> -Mac
>>
>
Re: running hadoop jobs from within a program
Posted by daniel sikar <ds...@gmail.com>.
I suggest you write a loop in your bash script, grepping for finished,
then take it from there.
Also, you can submit the same job as many times as you like.
On 12 November 2010 02:17, web service <wb...@gmail.com> wrote:
> Hi,
> Currently I run my sample hadoop job from a bash script using the
> following command ...
>
> [code]
> tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-$i/
> /user/vadmin/output/output-$i/
> $tmp
> [/code]
>
> However, I would want to write a timer that would do some cleanup after the
> jobs are complete and restart the jobs after x hours. What I am looking for
> is
> the ability to invoke job from within a program and not the jar command
> thing.
>
> -Mac
>
Re: running hadoop jobs from within a program
Posted by web service <wb...@gmail.com>.
would submitting, say for example 3 jobs from a jobclient be different than
invoking the below command 3 times ?
On Thu, Nov 11, 2010 at 7:17 PM, web service <wb...@gmail.com> wrote:
> Hi,
> Currently I run my sample hadoop job from a bash script using the
> following command ...
>
> [code]
> tmp="$HADOOP_BIN jar $JAR_LOC $MAIN_CLASS /user/joe/input/input-$i/
> /user/vadmin/output/output-$i/
> $tmp
> [/code]
>
> However, I would want to write a timer that would do some cleanup after the
> jobs are complete and restart the jobs after x hours. What I am looking for
> is
> the ability to invoke job from within a program and not the jar command
> thing.
>
> -Mac
>