You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2006/06/24 07:15:29 UTC

[jira] Created: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Need a job control utility to submit and monitor a group of jobs which have DAG dependency
------------------------------------------------------------------------------------------

         Key: HADOOP-322
         URL: http://issues.apache.org/jira/browse/HADOOP-322
     Project: Hadoop
        Type: New Feature

    Reporter: Runping Qi
 Assigned to: Runping Qi 



In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
	- accept jobs with dependency specification
      - monitor job status
      - submit jobs when they are ready

With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.

I'll post my design skech for comments/suggestion.
Eventually, I'll submit a patch for the utility.





-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12418148 ] 

Runping Qi commented on HADOOP-322:
-----------------------------------

OK,

I'll re-submit a new patch with a unit test class (may take a few days though).

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

               Status: Patch Available  (was: Open)
        Fix Version/s: 0.6.0
    Affects Version/s: 0.5.0


the sleep in the unit test is changed to 5 seconds from 60 seconds.
Also, UTF8 is replaced with Text.

The code is now in package org.apache.hadoop.mapred.jobcontrol.


> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-322
>                 URL: http://issues.apache.org/jira/browse/HADOOP-322
>             Project: Hadoop
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.6.0
>
>         Attachments: jobcontrol_patch.txt
>
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment:     (was: job_control_patch.txt)

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi

>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12418312 ] 

Doug Cutting commented on HADOOP-322:
-------------------------------------

When I remove the 'try' block and add 'throws Exception' where required, the unit test still passes, although all of the jobs fail:

% ant clean test -Dtestcase=TestJobControl

test:
    [mkdir] Created dir: /home/cutting/src/hadoop/build/test/data
    [mkdir] Created dir: /home/cutting/src/hadoop/build/test/logs
    [junit] Running org.apache.hadoop.jobs.TestJobControl
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 74.29 sec

BUILD SUCCESSFUL
Total time: 1 minute 22 seconds

Yet the log in build/test shows:

java.lang.OutOfMemoryError: Java heap space
Jobs are all done???
Jobs in waiting state: 0
Jobs in ready state: 0
Jobs in running state: 0
Jobs in success state: 0
Jobs in failed state: 4

Shouldn't the test fail if any of the jobs fail?


> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12418092 ] 

Doug Cutting commented on HADOOP-322:
-------------------------------------

This looks like a nice addition to Hadoop.  I'd like to see some unit tests added before it is committed.  Also, tabs are used for indentation, when spaces are preferred.

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12418317 ] 

Runping Qi commented on HADOOP-322:
-----------------------------------


Doug, this is a tricky question. The test class is to test the behavior of JobControl class, not the jobs it controls. In your case, I think the JobControl behaved correctly if job 3 and job 4 were failed without being submitted, due to one or more depending job fail. 

I can add in such logic to the test class to check the job state consistency.


> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12428200 ] 
            
Doug Cutting commented on HADOOP-322:
-------------------------------------

This looks good.

One minor nit is that the sleep in the unit test should probably be smaller.  If I change the sleep to 1000ms, then the entire test only takes 37 seconds, 20 seconds longer than the 60 second sleep you have.  I run unit tests a lot and am sensitive to them running slowly.

Also, this should be in a mapred subpackage, since it is mapred-specific stuff.  I'd suggest org.apache.hadoop.mapred.jobcontrol.

Thanks for your patience & persistence!

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-322
>                 URL: http://issues.apache.org/jira/browse/HADOOP-322
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: job_control_patch.txt
>
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment:     (was: job_control_patch.txt)

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi

>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12420720 ] 

Runping Qi commented on HADOOP-322:
-----------------------------------


Doug,

what is the status about this patch?



> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12418295 ] 

Doug Cutting commented on HADOOP-322:
-------------------------------------

The unit test class should extend junit.framework.TestCase and implement at least one public method named testXXX().

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Doug Cutting updated HADOOP-322:
--------------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Fixed

I just committed this.  Thanks Runping!

I also added a package.html.

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-322
>                 URL: http://issues.apache.org/jira/browse/HADOOP-322
>             Project: Hadoop
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.6.0
>
>         Attachments: jobcontrol_patch.txt
>
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment: job_control_patch.txt


My patch is attached.

It has two classes: hadoop.jobs.Job and hadoop.jobs.JobControl.

Job class encapsulates a MapReduce job and its dependency. It monitors 
the states of the depending jobs and updates the state of this job.
A job starts in the WAITING state. If it does not have any deoending jobs, or
all of the depending jobs are in SUCCESS state, then the job state will become
READY. If any depending jobs fail, the job will fail too. 
When in READY state, the job can be submitted to Hadoop for execution, with
the state changing into RUNNING state. From RUNNING state, the job can get into 
SUCCESS or FAILED state, depending the status of the jon execution.

JobControl class encapsulates a set of MapReduce jobs and its dependency. It tracks 
the states of the jobs by placing them into different tables according to their 
states.  This class provides APIs for the client app to add  jobs to the group and to get 
the jobs in different states. When a 
job is added, an ID unique to the group is assigned to the job. 
This class has a thread that submits jobs when they become ready, monitors the
states of the running jobs, and updates the states of jobs based on the state changes 
of their depending jobs states. The class provides APIs for suspending/resuming
the thread,and for stopping the thread.

A typical use scenarios is as follows:

    create a set of Map/Reduce job confs
    create a Job object per map/reduce job conf with proper depency 
    create a JobControl object
    add the Job objects to the JobControl object
   create a control thread and run it:

                     Thread theController = new Thread(theControl);
	theController.start();
	while (!theControl.allFinished()) {	
	        System.out.println("Jobs in waiting state: " + theControl.getWaitingJobs().size());
	        System.out.println("Jobs in ready state: " + theControl.getReadyJobs().size());
	        System.out.println("Jobs in running state: " + theControl.getRunningJobs().size());
	        System.out.println("Jobs in success state: " + theControl.getSuccessfulJobs().size());
	        System.out.println("Jobs in failed state: " + theControl.getFailedJobs().size());
	        System.out.println("\n");
			
	        try {
	                Thread.sleep(60000);
	        } catch (Exception e) {
				
	        }
	}
	theControl.stop();



> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment: job_control_patch.txt


Here is a new patch.

I've replaced tabs with spaces, and added a unit test class TestJobControl.


> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment: jobcontrol_patch.txt

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-322
>                 URL: http://issues.apache.org/jira/browse/HADOOP-322
>             Project: Hadoop
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.6.0
>
>         Attachments: jobcontrol_patch.txt
>
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment:     (was: job_control_patch.txt)

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi

>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment:     (was: job_control_patch.txt)

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-322
>                 URL: http://issues.apache.org/jira/browse/HADOOP-322
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment: job_control_patch.txt

Job state consistency check is added to TestJobControl class

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment:     (was: job_control_patch.txt)

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi

>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment: job_control_patch.txt


Fix a bug due to wrong commenting


> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment:     (was: job_control_patch.txt)

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12427983 ] 
            
Runping Qi commented on HADOOP-322:
-----------------------------------


Is there still a problem in commiting this patch?
I think the patch adds a good value to Hadoop APIs, and is pretty safe, since it does not affect the server side at all.



> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-322
>                 URL: http://issues.apache.org/jira/browse/HADOOP-322
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: job_control_patch.txt
>
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment: job_control_patch.txt

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-322?page=all ]

Runping Qi updated HADOOP-322:
------------------------------

    Attachment: job_control_patch.txt


TestJobControl nows extends junit.framework.TestCase and implements public void testJobControl() method.
Also some comments are added to explain what is going on.



> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12418309 ] 

Doug Cutting commented on HADOOP-322:
-------------------------------------

I think the 'try' block should be removed from your test.  Otherwise it will never fail, no?  Instead it should be permitted to throw exceptions if it fails, so that unit tests will fail.

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: job_control_patch.txt
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "paul sutter (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12417636 ] 

paul sutter commented on HADOOP-322:
------------------------------------


Would Ant solve the problem, if a few extensions were developed? (such as an extentsion to check for completion metadata in a .completed file)?



> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>          Key: HADOOP-322
>          URL: http://issues.apache.org/jira/browse/HADOOP-322
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Runping Qi
>     Assignee: Runping Qi

>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-322) Need a job control utility to submit and monitor a group of jobs which have DAG dependency

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-322?page=comments#action_12427993 ] 
            
Doug Cutting commented on HADOOP-322:
-------------------------------------

Sorry.  I lost track of this.  The new "Patch Available" status should help to keep this from happening.  I'll review this first thing tomorrow morning.  Thanks for the reminder.

> Need a job control utility to submit and monitor a group of jobs which have DAG dependency
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-322
>                 URL: http://issues.apache.org/jira/browse/HADOOP-322
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: job_control_patch.txt
>
>
> In my applications, some jobs depend on the outputs of other jobs. Therefore, job dependency forms a DAG. A job is ready to run if and only if it does not have any dependency or all the jobs it depends are finished successfully. To help schedule and monitor a group of jobs like that, I am thinking of implementing a utility that:
> 	- accept jobs with dependency specification
>       - monitor job status
>       - submit jobs when they are ready
> With such a utility, the application can construct its jobs, specify their dependency and then hand the jobs to the utility class. The utility takes care of the details of job submission.
> I'll post my design skech for comments/suggestion.
> Eventually, I'll submit a patch for the utility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira