You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by jiang licht <li...@yahoo.com> on 2010/02/17 06:31:45 UTC

Hadoop automatic job status check and notification?

New to Hadoop (now using 0.20.1), I want to do the following:
 
Automatic status check and notification of hadoop jobs such that e.g. when a job is finished, a script can be trigged so that job results can be automatically pulled back to local machines and expensive hadoop cluster can be released or shutdown.
 
So, what is the best way to do this?
 
Thanks!
--
Michael


      

Re: Hadoop automatic job status check and notification?

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Feb 17, 2010 at 1:03 PM, jiang licht <li...@yahoo.com> wrote:
> Amogh, this really helps me a lot! Thanks!
>
> So, in summary, I guess there are the following options to do job notification or more generally job management stuff. I also guess Oozie / cascading is the better choice when we need to handle these externally. Anyway, without deep exploration of all these options, I certainly may have misunderstandings. Correct me please :)
>
> - Prepare some external script and poll job status by communicating with hadoop job [-list | -status | etc.] at a regular pace and take actions accordingly. (pros: simple, cons: need to poll status, not event-driven )
>
> - Within a hadoop job written in java, make calls to appropriate job control functions to send out job status message if want. (pros: straightforward, cons: only for jobs in java)
>
> - Use Oozie / cascading to organize flow of hadoop jobs and other housekeeping job (e.g. pull back results, cleanup, shutdown clusters, and re-execute jobs against failure, etc.) (pros: powerful, can handle job control outside of jobs written in java/pig, cons: learning curve?)
>
> - Embedded pig (pros: works for jobs in pig scripts, cons: works for jobs in pig scripts)
>
> - What else?
>
> --
> Michael
>
> --- On Wed, 2/17/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>
> From: Amogh Vasekar <am...@yahoo-inc.com>
> Subject: Re: Hadoop automatic job status check and notification?
> To: "common-user@hadoop.apache.org" <co...@hadoop.apache.org>
> Date: Wednesday, February 17, 2010, 2:45 AM
>
> Hi,
> In our case we launched Pig from perl script and handled re-execution, clean-up etc. from there. If you need to implement a workflow or DAG like model, consider looking at Oozie / cascading. If you are interested in diving little deeper, you can try embedded pig.
>
> Amogh
>
>
> On 2/17/10 1:53 PM, "jiang licht" <li...@yahoo.com> wrote:
>
> Thanks Amogh.
>
> So, I think the following will do the job:
> public void setJobEndNotificationURI(String uri)But what about hadoop jobs written in PIG scripts? Since PIG will take control, is there some convenient  way to do the same thing as well?
>
> Thanks!
> --
> Michael
>
> --- On Wed, 2/17/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>
> From: Amogh Vasekar <am...@yahoo-inc.com>
> Subject: Re: Hadoop automatic job status check and notification?
> To: "common-user@hadoop.apache.org" <co...@hadoop.apache.org>
> Date: Wednesday, February 17, 2010, 12:44 AM
>
> Hi,
> When you submit a job to the cluster, you can control the behavior for blocking / return using JobClient's submitJob, runJob methods. It will also let you know if the job was successful or failed, so you can design your follow up scripts accordingly.
>
>
> Amogh
>
>
> On 2/17/10 11:01 AM, "jiang licht" <li...@yahoo.com> wrote:
>
> New to Hadoop (now using 0.20.1), I want to do the following:
>
> Automatic status check and notification of hadoop jobs such that e.g. when a job is finished, a script can be trigged so that job results can be automatically pulled back to local machines and expensive hadoop cluster can be released or shutdown.
>
> So, what is the best way to do this?
>
> Thanks!
> --
> Michael
>
Michael,

That is a pretty good summary.

Ozzie, cascading, are much more advanced work flow schedulers.

For reference, I use the JobClient object
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/JobClient.html
to poll the jobtracker and gather the information for these graphs.

http://www.jointhegrid.com/hadoop-cacti-jtg-walk/running_job.jsp
http://www.jointhegrid.com/hadoop-cacti-jtg-walk/maps_v_reduces.jsp

This is fairly easy to do. After you get connected, you have methods
like getAllJobs() or getJobById(String s) and can further interrogate
the return objects for the information you want. In my case I am
determining what state the jobs are in to draw a graph.

>>Automatic status check and notification of hadoop jobs such that e.g. when a job is finished, a script can be trigged so that job results can be automatically pulled back to local machines and expensive hadoop cluster can be released or shutdown.

Based on this requirement, you could also just handle the return code
in the driver of your map reduce program and take action. javax.mail,
messagebroker, etc.

Re: Hadoop automatic job status check and notification?

Posted by jiang licht <li...@yahoo.com>.
Amogh, this really helps me a lot! Thanks!

So, in summary, I guess there are the following options to do job notification or more generally job management stuff. I also guess Oozie / cascading is the better choice when we need to handle these externally. Anyway, without deep exploration of all these options, I certainly may have misunderstandings. Correct me please :)

- Prepare some external script and poll job status by communicating with hadoop job [-list | -status | etc.] at a regular pace and take actions accordingly. (pros: simple, cons: need to poll status, not event-driven )

- Within a hadoop job written in java, make calls to appropriate job control functions to send out job status message if want. (pros: straightforward, cons: only for jobs in java)

- Use Oozie / cascading to organize flow of hadoop jobs and other housekeeping job (e.g. pull back results, cleanup, shutdown clusters, and re-execute jobs against failure, etc.) (pros: powerful, can handle job control outside of jobs written in java/pig, cons: learning curve?)

- Embedded pig (pros: works for jobs in pig scripts, cons: works for jobs in pig scripts)

- What else?

--
Michael

--- On Wed, 2/17/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:

From: Amogh Vasekar <am...@yahoo-inc.com>
Subject: Re: Hadoop automatic job status check and notification?
To: "common-user@hadoop.apache.org" <co...@hadoop.apache.org>
Date: Wednesday, February 17, 2010, 2:45 AM

Hi,
In our case we launched Pig from perl script and handled re-execution, clean-up etc. from there. If you need to implement a workflow or DAG like model, consider looking at Oozie / cascading. If you are interested in diving little deeper, you can try embedded pig.

Amogh


On 2/17/10 1:53 PM, "jiang licht" <li...@yahoo.com> wrote:

Thanks Amogh.

So, I think the following will do the job:
public void setJobEndNotificationURI(String uri)But what about hadoop jobs written in PIG scripts? Since PIG will take control, is there some convenient  way to do the same thing as well?

Thanks!
--
Michael

--- On Wed, 2/17/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:

From: Amogh Vasekar <am...@yahoo-inc.com>
Subject: Re: Hadoop automatic job status check and notification?
To: "common-user@hadoop.apache.org" <co...@hadoop.apache.org>
Date: Wednesday, February 17, 2010, 12:44 AM

Hi,
When you submit a job to the cluster, you can control the behavior for blocking / return using JobClient's submitJob, runJob methods. It will also let you know if the job was successful or failed, so you can design your follow up scripts accordingly.


Amogh


On 2/17/10 11:01 AM, "jiang licht" <li...@yahoo.com> wrote:

New to Hadoop (now using 0.20.1), I want to do the following:

Automatic status check and notification of hadoop jobs such that e.g. when a job is finished, a script can be trigged so that job results can be automatically pulled back to local machines and expensive hadoop cluster can be released or shutdown.

So, what is the best way to do this?

Thanks!
--
Michael












      

Re: Hadoop automatic job status check and notification?

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
In our case we launched Pig from perl script and handled re-execution, clean-up etc. from there. If you need to implement a workflow or DAG like model, consider looking at Oozie / cascading. If you are interested in diving little deeper, you can try embedded pig.

Amogh


On 2/17/10 1:53 PM, "jiang licht" <li...@yahoo.com> wrote:

Thanks Amogh.

So, I think the following will do the job:
public void setJobEndNotificationURI(String uri)But what about hadoop jobs written in PIG scripts? Since PIG will take control, is there some convenient  way to do the same thing as well?

Thanks!
--
Michael

--- On Wed, 2/17/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:

From: Amogh Vasekar <am...@yahoo-inc.com>
Subject: Re: Hadoop automatic job status check and notification?
To: "common-user@hadoop.apache.org" <co...@hadoop.apache.org>
Date: Wednesday, February 17, 2010, 12:44 AM

Hi,
When you submit a job to the cluster, you can control the behavior for blocking / return using JobClient's submitJob, runJob methods. It will also let you know if the job was successful or failed, so you can design your follow up scripts accordingly.


Amogh


On 2/17/10 11:01 AM, "jiang licht" <li...@yahoo.com> wrote:

New to Hadoop (now using 0.20.1), I want to do the following:

Automatic status check and notification of hadoop jobs such that e.g. when a job is finished, a script can be trigged so that job results can be automatically pulled back to local machines and expensive hadoop cluster can be released or shutdown.

So, what is the best way to do this?

Thanks!
--
Michael










Re: Hadoop automatic job status check and notification?

Posted by jiang licht <li...@yahoo.com>.
Thanks Amogh.

So, I think the following will do the job:
public void setJobEndNotificationURI(String uri)But what about hadoop jobs written in PIG scripts? Since PIG will take control, is there some convenient  way to do the same thing as well?
 
Thanks!
--
Michael

--- On Wed, 2/17/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:

From: Amogh Vasekar <am...@yahoo-inc.com>
Subject: Re: Hadoop automatic job status check and notification?
To: "common-user@hadoop.apache.org" <co...@hadoop.apache.org>
Date: Wednesday, February 17, 2010, 12:44 AM

Hi,
When you submit a job to the cluster, you can control the behavior for blocking / return using JobClient's submitJob, runJob methods. It will also let you know if the job was successful or failed, so you can design your follow up scripts accordingly.


Amogh


On 2/17/10 11:01 AM, "jiang licht" <li...@yahoo.com> wrote:

New to Hadoop (now using 0.20.1), I want to do the following:

Automatic status check and notification of hadoop jobs such that e.g. when a job is finished, a script can be trigged so that job results can be automatically pulled back to local machines and expensive hadoop cluster can be released or shutdown.

So, what is the best way to do this?

Thanks!
--
Michael







      

Re: Hadoop automatic job status check and notification?

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
When you submit a job to the cluster, you can control the behavior for blocking / return using JobClient's submitJob, runJob methods. It will also let you know if the job was successful or failed, so you can design your follow up scripts accordingly.


Amogh


On 2/17/10 11:01 AM, "jiang licht" <li...@yahoo.com> wrote:

New to Hadoop (now using 0.20.1), I want to do the following:

Automatic status check and notification of hadoop jobs such that e.g. when a job is finished, a script can be trigged so that job results can be automatically pulled back to local machines and expensive hadoop cluster can be released or shutdown.

So, what is the best way to do this?

Thanks!
--
Michael