You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Adrian Woodhead (JIRA)" <ji...@apache.org> on 2007/10/22 12:44:50 UTC

[jira] Created: (HADOOP-2086) ability to add dependencies to a job after construction

ability to add dependencies to a job after construction
-------------------------------------------------------

                 Key: HADOOP-2086
                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.14.2
         Environment: n/a
            Reporter: Adrian Woodhead


The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538778 ] 

Hadoop QA commented on HADOOP-2086:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12368665/job-add-dependencies3.patch
against trunk revision r589879.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1028/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1028/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1028/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1028/console

This message is automatically generated.

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch, job-add-dependencies3.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538447 ] 

Adrian Woodhead commented on HADOOP-2086:
-----------------------------------------

Hmm, findbugs is complaining that getState() isn't synchronized while setState() is. Should we make getState() synchronized too?

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Affects Version/s:     (was: 0.14.2)
                       0.14.3
               Status: Patch Available  (was: Open)

New patch available where I have made getState() synchronized as discussed with Runping Qi. In an earlier patch it was also recommended that I change setState() and submit() from public to package private but we found this broke our own code as we extend job in order to create a NonMapReduceJob class which can be made dependent on other Jobs and submitted to JobControl. Making the methods protected allows more flexibility in extending Job and is an improvement on the methods being public.

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch, job-add-dependencies3.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-2086:
----------------------------------

    Status: Open  (was: Patch Available)

I'm marking this issue 'Open' while Runping and Adrian get to a common ground...

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-2086:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Adrian!

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: n/a
>            Reporter: Adrian Woodhead
>            Assignee: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch, job-add-dependencies3.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-2086:
----------------------------------

    Status: Open  (was: Patch Available)

Adrain, could you please generate a patch, do an 'Attach File' and also 'Grant license to ASF for inclusion in ASF works' ? Thanks!

More details here: http://wiki.apache.org/lucene-hadoop/HowToContribute.

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-2086:
----------------------------------

    Fix Version/s: 0.16.0

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Attachment: job-add-dependencies2.patch

Made all changes discussed since the last patch.
Code review please.

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537909 ] 

Adrian Woodhead commented on HADOOP-2086:
-----------------------------------------

Sorry, another question upon looking at the code more: I see submit() also changes the state of the job, could this be another point for a race condition?



> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538103 ] 

Hadoop QA commented on HADOOP-2086:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12368498/job-add-dependencies2.patch
against trunk revision r588699.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1011/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1011/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1011/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1011/console

This message is automatically generated.

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Status: Patch Available  (was: Open)

Here is a patch which fixes the issue I outlined in the original submission. If you require me to make any changes before you accept it please let me know and I will be happy to make them.

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537666 ] 

Adrian Woodhead commented on HADOOP-2086:
-----------------------------------------

OK, I can think of two ways of doing this:

1)
  public boolean addDependingJob(Job dependingJob) {
    if (checkState() == Job.WAITING) {
      //code to add depending job to list goes here
    } else {
      return false;
    }
  }

2)
  public boolean addDependingJob(Job dependingJob) {
    int currentState=checkState();
    if (currentState == Job.READY || currentState == Job.RUNNING ) {
      return false;
    }
   //code to add depending job to list goes here

My preference would be for 1) because I assume it only makes sense to add a depending job to a job that is in the WAITING state, unless I am missing something and there are cases when one would want to add depending jobs in other states?

Are there any possible synchronization issues with state? (i.e. we check the state but before we can add the depending job, the state changes, can this happen?)


> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537706 ] 

runping edited comment on HADOOP-2086 at 10/25/07 1:54 PM:
--------------------------------------------------------------

I like 1).

Yes, there is a possibility of race condition when the JobControl may be also checking the state of the job. 
One solution is to make them synchronized:

{code}
synchronized public boolean addDependingJob(Job dependingJob) {
    if (this.state == Job.WAITING) { 
        //code to add depending job to list goes here 
    } else { 
        return false; 
    }
}

synchronized int checkState() {
 ...
}
{code}

Maybe it is also a good idea to make checkState() package private since it should not be called externally.





      was (Author: runping):
    
I like 1).

Yes, there is a possibility of race condition when the JobControl may be also checking the state of the job. 
One solution is to make them synchronized:

{code}
synchronized public boolean addDependingJob(Job dependingJob) {
    if (this.state == Job.WAITING) { 
        //code to add depending job to list goes here 
    } else { 
        return false; 
    }
}

synchronized int checkState() {
 ...
}
{/code}

Maybe it is also a good idea to make checkState() package private since it should not be called externally.




  
> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538017 ] 

Runping Qi commented on HADOOP-2086:
------------------------------------


Right. WAITING is the only valid state we can add a dependending job. 
submit() and setState() alter the state variable.
Thus they should be synchronized. And they should be package private too (only JobControl uses them).



> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Status: Open  (was: Patch Available)

Updated version of patch available to replace this

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537653 ] 

Runping Qi commented on HADOOP-2086:
------------------------------------

+0 the code looks good.

However, the semantics of the new api addDependingJob(Job dependingJob) should be clarified further 
for the case where the current Job object is in ready state or has already started execution. In that case, 
this api should be effectively an no op and returns false.




> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Status: Patch Available  (was: Open)

Patch to allow adding jobs to a job via a method rather then just in constructor

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538534 ] 

Runping Qi commented on HADOOP-2086:
------------------------------------

There is no real harm to make getState synchronized, 
although either way is fine in my opinion.


> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537903 ] 

Adrian Woodhead commented on HADOOP-2086:
-----------------------------------------

OK, I agree with all your comments. One final question and I'll submit another patch - is the READY state also acceptable for adding a depending job? Or is only WAITING valid?

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Status: Open  (was: Patch Available)

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Attachment:     (was: job-add-dependencies.patch)

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Attachment: job-add-dependencies.patch

Code review of patch please.
No unit tests added as there is no JobTest unit test and the functionality added is trivial (adding an item to a list)

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539581 ] 

Runping Qi commented on HADOOP-2086:
------------------------------------


+1

Looks good.


> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch, job-add-dependencies3.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Attachment: job-add-dependencies.patch

Code review of patch please.
No unit tests added as there is no JobTest unit test and the functionality added is trivial (adding an item to a list)


> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das reassigned HADOOP-2086:
-----------------------------------

    Assignee: Adrian Woodhead

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: n/a
>            Reporter: Adrian Woodhead
>            Assignee: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch, job-add-dependencies3.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Attachment: job-add-dependencies3.patch

Patch which makes getState() synchronized as discussed, also changes some methods which were made package private to protected.

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch, job-add-dependencies3.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539668 ] 

Hudson commented on HADOOP-2086:
--------------------------------

Integrated in Hadoop-Nightly #290 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/290/])

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.3
>         Environment: n/a
>            Reporter: Adrian Woodhead
>            Assignee: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch, job-add-dependencies3.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Status: Patch Available  (was: Open)

2nd attempt at Patch incorporating feedback from Runping Qi

> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538051 ] 

Runping Qi commented on HADOOP-2086:
------------------------------------


+1.


> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch, job-add-dependencies2.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537706 ] 

Runping Qi commented on HADOOP-2086:
------------------------------------


I like 1).

Yes, there is a possibility of race condition when the JobControl may be also checking the state of the job. 
One solution is to make them synchronized:

{code}
synchronized public boolean addDependingJob(Job dependingJob) {
    if (this.state == Job.WAITING) { 
        //code to add depending job to list goes here 
    } else { 
        return false; 
    }
}

synchronized int checkState() {
 ...
}
{/code}

Maybe it is also a good idea to make checkState() package private since it should not be called externally.





> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>             Fix For: 0.16.0
>
>         Attachments: job-add-dependencies.patch
>
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2086) ability to add dependencies to a job after construction

Posted by "Adrian Woodhead (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Woodhead updated HADOOP-2086:
------------------------------------

    Status: Patch Available  (was: Open)

Index: /home/adrian/workspace/hadoop-0.14.2/src/java/org/apache/hadoop/mapred/jobcontrol/Job.java
===================================================================
--- /home/adrian/workspace/hadoop-0.14.2/src/java/org/apache/hadoop/mapred/jobcontrol/Job.java	(revision 584810)
+++ /home/adrian/workspace/hadoop-0.14.2/src/java/org/apache/hadoop/mapred/jobcontrol/Job.java	(working copy)
@@ -77,6 +77,15 @@
     this.message = "just initialized";
     this.jc = new JobClient(jobConf);
   }
+  
+  /**
+   * Construct a job that doesn't depend on any other jobs.
+   * @param jobConf a mapred job configuration representing a job to be executed.
+   * @throws IOException
+   */
+  public Job(JobConf jobConf) throws IOException {
+  	this(jobConf, null);
+  }
 	
   public String toString() {
     StringBuffer sb = new StringBuffer();
@@ -86,7 +95,7 @@
     sb.append("job mapred id:\t").append(this.mapredJobID).append("\n");
     sb.append("job message:\t").append(this.message).append("\n");
 		
-    if (this.dependingJobs == null) {
+    if (this.dependingJobs == null || this.dependingJobs.size() == 0) {
       sb.append("job has no depending job:\t").append("\n");
     } else {
       sb.append("job has ").append(this.dependingJobs.size()).append(" dependeng jobs:\n");
@@ -255,6 +264,18 @@
       }
     }
   }
+  
+  /**
+   * Add a job to this jobs' dependency list.
+   * @param dependingJob Job that this job depends on.
+   * @return <tt>true</tt> if this collection changed as a result of the call.
+   */
+  public boolean addDependingJob(Job dependingJob) {
+  	if (this.dependingJobs == null) {
+  		this.dependingJobs = new ArrayList();
+  	}
+  	return this.dependingJobs.add(dependingJob);
+  }
 	
   /**
    * Check and update the state of this job. The state changes  


> ability to add dependencies to a job after construction
> -------------------------------------------------------
>
>                 Key: HADOOP-2086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2086
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.2
>         Environment: n/a
>            Reporter: Adrian Woodhead
>
> The current Job API only allows for dependent jobs to be passed in at object construction time. It would be nice if there was an additional constructor which did not take any depending jobs and then an "addDependingJob" method which could be used to add depending jobs to a job at a later point.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.