You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Marko Dinic <ma...@nissatech.com> on 2015/04/27 14:35:56 UTC

How to call Hadoop job from a web service in a non-blocking fashion?

Hello,

I have a sequence of jobs that depend on each other, output of one job 
is input for the next one. Also, there is a loop in one part of the 
sequence, containing two jobs executing in a row.

Until now I was able to run this job by simply creating Job objects and 
using waitForCompletition(true). In that way, I was forwarding output of 
one job as input to next one.

The problem is, waitForCompletition(true) will block the web service I'm 
trying to use, so I need a way to run this sequence of dependent jobs, 
but not to get stuck waiting for result of the whole sequence. So, I 
want the next model - user uploads some files, starts the job and gets 
the response that the job has been started. After the sequence has 
finished user should be notified in some way.

I wouldn't like to use Oozie, since this the jobs are more low-level 
(it's actually an algorithm similar to those implemented in Mahout), and 
I don't know if I may use JobControl, since there is a loop, and how to 
do it.

Any help would be highly appreciated.

Regards,
Marko

Re: How to call Hadoop job from a web service in a non-blocking fashion?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Marko,

Job#waitForCompletion is implemented as a polling loop around
Job#isComplete and Job#isSuccessful.  Both of those calls are non-blocking.

http://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapreduce/Job.ht
ml#isComplete()


http://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapreduce/Job.ht
ml#isSuccessful()


(Technically they do block on I/O due to RPC calls to check job status,
but the idea is that they don't block waiting for the entire job to
complete the way waitForCompletion does.)

Do you think you could build the solution that you need around using these
2 methods?  I believe Oozie implements it in a similar way.  Even though
you aren't using Oozie, you might consider looking at the Oozie codebase
for inspiration.  I think the relevant class in Oozie would be
JavaActionExecutor.  Disclaimer: I really don't know Oozie very well.  :-)

--Chris Nauroth




On 4/27/15, 5:35 AM, "Marko Dinic" <ma...@nissatech.com> wrote:

>Hello,
>
>I have a sequence of jobs that depend on each other, output of one job
>is input for the next one. Also, there is a loop in one part of the
>sequence, containing two jobs executing in a row.
>
>Until now I was able to run this job by simply creating Job objects and
>using waitForCompletition(true). In that way, I was forwarding output of
>one job as input to next one.
>
>The problem is, waitForCompletition(true) will block the web service I'm
>trying to use, so I need a way to run this sequence of dependent jobs,
>but not to get stuck waiting for result of the whole sequence. So, I
>want the next model - user uploads some files, starts the job and gets
>the response that the job has been started. After the sequence has
>finished user should be notified in some way.
>
>I wouldn't like to use Oozie, since this the jobs are more low-level
>(it's actually an algorithm similar to those implemented in Mahout), and
>I don't know if I may use JobControl, since there is a loop, and how to
>do it.
>
>Any help would be highly appreciated.
>
>Regards,
>Marko


Re: How to call Hadoop job from a web service in a non-blocking fashion?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Marko,

Job#waitForCompletion is implemented as a polling loop around
Job#isComplete and Job#isSuccessful.  Both of those calls are non-blocking.

http://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapreduce/Job.ht
ml#isComplete()


http://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapreduce/Job.ht
ml#isSuccessful()


(Technically they do block on I/O due to RPC calls to check job status,
but the idea is that they don't block waiting for the entire job to
complete the way waitForCompletion does.)

Do you think you could build the solution that you need around using these
2 methods?  I believe Oozie implements it in a similar way.  Even though
you aren't using Oozie, you might consider looking at the Oozie codebase
for inspiration.  I think the relevant class in Oozie would be
JavaActionExecutor.  Disclaimer: I really don't know Oozie very well.  :-)

--Chris Nauroth




On 4/27/15, 5:35 AM, "Marko Dinic" <ma...@nissatech.com> wrote:

>Hello,
>
>I have a sequence of jobs that depend on each other, output of one job
>is input for the next one. Also, there is a loop in one part of the
>sequence, containing two jobs executing in a row.
>
>Until now I was able to run this job by simply creating Job objects and
>using waitForCompletition(true). In that way, I was forwarding output of
>one job as input to next one.
>
>The problem is, waitForCompletition(true) will block the web service I'm
>trying to use, so I need a way to run this sequence of dependent jobs,
>but not to get stuck waiting for result of the whole sequence. So, I
>want the next model - user uploads some files, starts the job and gets
>the response that the job has been started. After the sequence has
>finished user should be notified in some way.
>
>I wouldn't like to use Oozie, since this the jobs are more low-level
>(it's actually an algorithm similar to those implemented in Mahout), and
>I don't know if I may use JobControl, since there is a loop, and how to
>do it.
>
>Any help would be highly appreciated.
>
>Regards,
>Marko


Re: How to call Hadoop job from a web service in a non-blocking fashion?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Marko,

Job#waitForCompletion is implemented as a polling loop around
Job#isComplete and Job#isSuccessful.  Both of those calls are non-blocking.

http://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapreduce/Job.ht
ml#isComplete()


http://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapreduce/Job.ht
ml#isSuccessful()


(Technically they do block on I/O due to RPC calls to check job status,
but the idea is that they don't block waiting for the entire job to
complete the way waitForCompletion does.)

Do you think you could build the solution that you need around using these
2 methods?  I believe Oozie implements it in a similar way.  Even though
you aren't using Oozie, you might consider looking at the Oozie codebase
for inspiration.  I think the relevant class in Oozie would be
JavaActionExecutor.  Disclaimer: I really don't know Oozie very well.  :-)

--Chris Nauroth




On 4/27/15, 5:35 AM, "Marko Dinic" <ma...@nissatech.com> wrote:

>Hello,
>
>I have a sequence of jobs that depend on each other, output of one job
>is input for the next one. Also, there is a loop in one part of the
>sequence, containing two jobs executing in a row.
>
>Until now I was able to run this job by simply creating Job objects and
>using waitForCompletition(true). In that way, I was forwarding output of
>one job as input to next one.
>
>The problem is, waitForCompletition(true) will block the web service I'm
>trying to use, so I need a way to run this sequence of dependent jobs,
>but not to get stuck waiting for result of the whole sequence. So, I
>want the next model - user uploads some files, starts the job and gets
>the response that the job has been started. After the sequence has
>finished user should be notified in some way.
>
>I wouldn't like to use Oozie, since this the jobs are more low-level
>(it's actually an algorithm similar to those implemented in Mahout), and
>I don't know if I may use JobControl, since there is a loop, and how to
>do it.
>
>Any help would be highly appreciated.
>
>Regards,
>Marko


Re: How to call Hadoop job from a web service in a non-blocking fashion?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Marko,

Job#waitForCompletion is implemented as a polling loop around
Job#isComplete and Job#isSuccessful.  Both of those calls are non-blocking.

http://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapreduce/Job.ht
ml#isComplete()


http://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapreduce/Job.ht
ml#isSuccessful()


(Technically they do block on I/O due to RPC calls to check job status,
but the idea is that they don't block waiting for the entire job to
complete the way waitForCompletion does.)

Do you think you could build the solution that you need around using these
2 methods?  I believe Oozie implements it in a similar way.  Even though
you aren't using Oozie, you might consider looking at the Oozie codebase
for inspiration.  I think the relevant class in Oozie would be
JavaActionExecutor.  Disclaimer: I really don't know Oozie very well.  :-)

--Chris Nauroth




On 4/27/15, 5:35 AM, "Marko Dinic" <ma...@nissatech.com> wrote:

>Hello,
>
>I have a sequence of jobs that depend on each other, output of one job
>is input for the next one. Also, there is a loop in one part of the
>sequence, containing two jobs executing in a row.
>
>Until now I was able to run this job by simply creating Job objects and
>using waitForCompletition(true). In that way, I was forwarding output of
>one job as input to next one.
>
>The problem is, waitForCompletition(true) will block the web service I'm
>trying to use, so I need a way to run this sequence of dependent jobs,
>but not to get stuck waiting for result of the whole sequence. So, I
>want the next model - user uploads some files, starts the job and gets
>the response that the job has been started. After the sequence has
>finished user should be notified in some way.
>
>I wouldn't like to use Oozie, since this the jobs are more low-level
>(it's actually an algorithm similar to those implemented in Mahout), and
>I don't know if I may use JobControl, since there is a loop, and how to
>do it.
>
>Any help would be highly appreciated.
>
>Regards,
>Marko