You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2013/08/02 01:39:49 UTC

[jira] [Updated] (OOZIE-1483) Support for Job Recoverability

     [ https://issues.apache.org/jira/browse/OOZIE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated OOZIE-1483:
---------------------------------

    Description: 
To support for the JobTracker to recover jobs on restart, we need to configure the launcher job to be restarted by the JT, but not any of the launched jobs ({{mapred.job.restart.recover}}).  This way, the launcher job will simply start over when the JT recovers it; if we allow the JT to recover the actual jobs, then they will interfere.  

This should be fairly trivial except for the MapReduce action because of the optimization where the launcher finishes instead of waiting for the actual job and Oozie does an "id swap".  Trying to add support for JT to recover the MR action doesn't seem feasible as we'd run into a lot of trickiness and some race conditions due to the id swap.  

Instead, I think we should remove the MR optimization because it will allow us to to support the recoverability for the MR action as well.  This also has the benefit of simplifying the code because we'd be getting rid of all of the id swap stuff and also making the MR action consistent with the other actions.  The only downside is that the MR action will take an extra Map slot just like the other actions.  

  was:
To support for the JobTracker to recover jobs on restart, we need to configure the launcher job to be restarted by the JT, but not any of the launched jobs ({{mapred.job.restart.recover}}.  This way, the launcher job will simply start over when the JT recovers it; if we allow the JT to recover the actual jobs, then they will interfere.  

This should be fairly trivial except for the MapReduce action because of the optimization where the launcher finishes instead of waiting for the actual job and Oozie does an "id swap".  Trying to add support for JT to recover the MR action doesn't seem feasible as we'd run into a lot of trickiness and some race conditions due to the id swap.  

Instead, I think we should remove the MR optimization because it will allow us to to support the recoverability for the MR action as well.  This also has the benefit of simplifying the code because we'd be getting rid of all of the id swap stuff and also making the MR action consistent with the other actions.  The only downside is that the MR action will take an extra Map slot just like the other actions.  

    
> Support for Job Recoverability
> ------------------------------
>
>                 Key: OOZIE-1483
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1483
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>
> To support for the JobTracker to recover jobs on restart, we need to configure the launcher job to be restarted by the JT, but not any of the launched jobs ({{mapred.job.restart.recover}}).  This way, the launcher job will simply start over when the JT recovers it; if we allow the JT to recover the actual jobs, then they will interfere.  
> This should be fairly trivial except for the MapReduce action because of the optimization where the launcher finishes instead of waiting for the actual job and Oozie does an "id swap".  Trying to add support for JT to recover the MR action doesn't seem feasible as we'd run into a lot of trickiness and some race conditions due to the id swap.  
> Instead, I think we should remove the MR optimization because it will allow us to to support the recoverability for the MR action as well.  This also has the benefit of simplifying the code because we'd be getting rid of all of the id swap stuff and also making the MR action consistent with the other actions.  The only downside is that the MR action will take an extra Map slot just like the other actions.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira