You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2008/10/01 13:15:44 UTC

[jira] Updated: (HADOOP-4053) Schedulers need to know when a job has completed

     [ https://issues.apache.org/jira/browse/HADOOP-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4053:
-------------------------------

    Attachment: HADOOP-4053-v3.1.patch

Attaching a patch that implements the {{JobChangeEvent}} concept. Here is how it is implemented.

_Assumptions :_
Everything that has the potential to change a job's state is captured and bundled under {{JobStatus}}. Hence taking snapshot of job's status before and after the event should be sufficient determine the state change.

_Working :_
1) {{JobInProgressListener.jobUpdated()}} now takes {{JobChangeEvent}} as a parameter.

2) {{JobChangeEvent}} is an abstract class that has just one api, {{getJobInProgress()}}.

3) For the task at hand, i.e handling _priority-change_, _start-time-change_ and _job-runstate-change_, I have extended {{JobChangeEvent}} to {{JobStatusChangeEvent}}. 

4) {{JobStateChangeEvent}} hosts a set of _sub-events_ that can lead to job-status change. These are fields from {{JobStatus}} that has a potential to change for a given job. Some of them are _priority, start-time, run-state_ etc. While composing an event, one can specify what all _sub-events_ constitute the state change. Note that the order in which the _sub-events_ are specified is also preserved.

5) For capacity-scheduler,  based on the _sub-events_ constituting the state transition, appropriate action is performed. For now the actions are
    - promote a job from the waiting queue to the running queue
    - remove a job upon job completion
    - re-position the job in the queue as the parameters that decide where the job is positioned has changed

6) If {{JobStateChangeEvent}} fails to capture all the events then {{JobChangeEvent}} can be extended to cater that case.

7) Other listener implementations remain unchanged as they just require {{jobInProgress}} which is obtained from {{JobChangeEvent}}.

Tested the patch with capacity scheduler and it works fine. The web-ui doesnt show completed jobs in the job queue which means that the job is removed upon completion. _test-patch_ and _ant test_ pass on my box. Rest of the listener implementations should not be affected.
This patch is meant for 0.19.

> Schedulers need to know when a job has completed
> ------------------------------------------------
>
>                 Key: HADOOP-4053
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4053
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Vivek Ratan
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4053-v1.patch, HADOOP-4053-v2.patch, HADOOP-4053-v3.1.patch
>
>
> The JobInProgressListener interface is used by the framework to notify Schedulers of when jobs are added, removed, or updated. Right now, there is no way for the Scheduler to know that a job has completed. jobRemoved() is called when a job is retired, which can happen many hours after a job is actually completed. jobUpdated() is called when a job's priority is changed. We need to notify a listener when a job has completed (either successfully, or has failed or been killed). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.