You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2007/03/15 08:14:09 UTC

[jira] Created: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Recovering running/scheduled jobs after JobTracker failure
----------------------------------------------------------

                 Key: HADOOP-1121
                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
             Project: Hadoop
          Issue Type: New Feature
          Components: mapred
         Environment: all
            Reporter: Alejandro Abdelnur


Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.

Proposal:

(1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
(2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
(3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
(4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.

(A) rename all job config files to $JOB_CONFIG_FILE.recover.
(B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
(C) when B is completed start accepting new job submissions.

Other details:

A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
A startup flag could switch off job recovery for systems with the recover set to ON.
Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.

Further improvements on top of this one:

This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.

State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495128 ] 

Owen O'Malley commented on HADOOP-1121:
---------------------------------------

I'm not happy with the design of the patch.

1. Recoverability should be a property of the job, period. The config variable to disable it at startup in the job tracker would be very error prone.

2. No output directories should be deleted by the framework. Doing otherwise is inconsistent with the rest of the framework.

3. As Doug pointed out, the framework should use the InputFormat and OutputFormat to check the preconditions of the job. This implies that they will load the user's job.jar and should probably be done in a separate process.

4. Using the original job id is _not_ a good idea, unless you block collisions in the job names. It would help a lot if we had objects for ids instead of strings. As a short term solution, I'd suggest that you set the next job id to be one higher that the last recovered job's id.


> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.14.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1121:
---------------------------------------

    Fix Version/s: 0.13.0
           Status: Patch Available  (was: Open)

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.13.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508285 ] 

Doug Cutting commented on HADOOP-1121:
--------------------------------------

I'm okay with making OutputFormat's API better support re-running crashed jobs.  That's a pre-requisite for this issue.  We might add OutputFormat methods like:

{code}
/** Called to initialize output for this job. */
void initialize(JobConf job) throws IOException;
/** Called to finalize output for this job. */
void commit(JobConf job) throws IOException;
{code}

In the base implemenation for FileSystem output, initialize() might then create a temporary directory for the job, removing any that already exists, and commit could rename the temporary output directory to the final name.  The existing checkOutputSpecs() would continue to throw an exception if the final output already exists.  Could something like this work?

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.14.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495094 ] 

Owen O'Malley commented on HADOOP-1121:
---------------------------------------

This looks like a feature, not a bug fix. I haven't gone through it, but I don't see a justification for it to go into 0.13.

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.13.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur resolved HADOOP-1121.
----------------------------------------

    Resolution: Won't Fix

Hadoop-1876 enables a job controller (using the RunningJob API) to know if job has been completed or not regardless of the  JT being restarted. This enables the job controller to to resubmit jobs that for sure have not been completed. 

With Hadoop-1876 is possible to achieve the functionality of Hadoop-1121 from the job controller client.

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494950 ] 

Hadoop QA commented on HADOOP-1121:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12357074/patch1121.txt applied and successfully tested against trunk revision r536583.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/130/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/130/console

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.13.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Ruchir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481159 ] 

Ruchir commented on HADOOP-1121:
--------------------------------

Hadoop (JobTracker) is creating a directory in DFS when job is submitted. This directory contains configuration file (xml) and also job jar file. At job completion event (success or failure), it deletes that temporary directory from DFS. So in order to restart job again which was not completed when jobTracker failed, we just need to find all directories in mapred.system.dir and submit those jobs again.

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated HADOOP-1121:
---------------------------------------

    Attachment: patch1121.txt

Attached is a patch (mostly done by Ruchir)

Patch details:

It leverages the current staging of jobs by the JT.

* What is different at job scheduling time:

If the 'job.autorecovery.enabled' property is set to TRUE (default is FALSE) in the job conf, the job will be set for autorecovery. 

Jobs set for autorecovery have an all-time unique job ID. This is done by using the timestamp (up to seconds) of the JT startup time.

Jobs set for autorecovery store the original job ID in a jobid.txt file next to the job.xml file.

* What is different at JT startup time:

If the 'jobtracker.startupautorecovery.enabled' property is not set or it is set to FALSE (default),  at JT startup the /tmp/USER/mapred/system directory is clean up as usual.

If the property is set to TRUE, at JT startup all job directories containing a 'jobid.txt' file are preserved, the rest are deleted. For each job directory remaining, existence of the input directory is verified, cleanup of the output directory is done if it exist, then the corresponding job.xml is rescheduled with the original job ID (stored in the jobid.txt) file.

* What is different at JT job clean up time:

Nothing. JT deletes the job directory from system once the job finishes. This will cleanup autorecovery jobs as well.




> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.13.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508160 ] 

Alejandro Abdelnur commented on HADOOP-1121:
--------------------------------------------

Reviving this issue (hoping it will make it for 0.14).

I'd like to hear opinions on my last comments.

On #3 (OutputFormat, existing dirs and deleting), I had to do some homework on my side.

Building on Doug's idea in his first comment, the temporary output directory idea would do:

Something like:

* OutputFormat changes:
  * Change getRecordWrite() contract - it should return a non-existing temp directory and keep track of it.
  * Add a method done() - it should move/rename the temp directory to the output directory.

* JobTracker changes:
  * On start up it should clean up the temp directories area.
  * On job completion it should call done() on the OutputFormat instance.

If this seems alright I can create a separate issue for this.


> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.14.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482189 ] 

Doug Cutting commented on HADOOP-1121:
--------------------------------------

> delete the output directory if it exists [ ... ]

This is OutputFormat-specific.  The kernel should not assume that there's an output directory.

Also, it might be difficult for OutputFormatBase to tell if an existing output directory was from a failed run that's being restarted or if it was previously existing and should stop the job from starting.  A fix might be to have OutputFormatBase always write things to a temporary directory, then rename this on completion.  The temporary directory could always be safely deleted on restart.  Perhaps we should add a new OutputFormat method to perform such cleanups?

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495450 ] 

Alejandro Abdelnur commented on HADOOP-1121:
--------------------------------------------

* On Owen's #1 comment, startup flag

Yes, Recoverability is a job property at scheduling time, but default is FALSE.

There is another JT property to disable recovering all jobs at start up, currently its default FALSE, it should be TRUE, it should be there just for admin purposes, when for some reason the admin wants to do a fresh start regardless (like in Databases forcing a rollback o commit of stuff in the logs).

* On Owen's #2 comment, output directory deletion

I may be missing something, but at task (a single map or reduce) failure, when the JT restarts the faile task, it does a clean up of whatever output the failed task did. Right?

* On Doug's and Owen's #3 comment, Inputformat and OutputFormat

I'll have to look into this, not sure what you mean.

* On Owen's #4 comment, on original job id

Jobs scheduled with autorecovery on have a job ID of the form 'job_TIMESTAMP_####', for example 'job_20070511075754_0002'. Where TIMESTAMP is the time up to seconds when the JT was started.

This uniqueness serves two purposes:

1. There are not job ID collissions.
2. Systems tracking jobs can find the status of a job ID recovered after a failure.



> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.14.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1121:
---------------------------------

    Fix Version/s:     (was: 0.13.0)
                   0.14.0
           Status: Open  (was: Patch Available)

Again, the MapReduce kernel should not directly access input or output files.  These should only be accessed through InputFormat and OutputFormat methods.  So the code you've added to the JobInProgress constructor that examines "mapred.input.path" and "mapred.output.path" should instead call InputFormat.validateInput() and OutputFormat.checkOutputSpecs().

I've not yet examined the rest of the patch closely.

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.14.0
>
>         Attachments: patch1121.txt
>
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494939 ] 

Alejandro Abdelnur commented on HADOOP-1121:
--------------------------------------------

* On Doug's comment on deleting output directory if exists:

At job submission time the non-existence of the output directory is performed. If it exists the jobs submission fails.

So, if a previously scheduled not-completed job has an output directory it means it never completed its execution.

Thus deleting the output directory brings the job back to clean submission state (before running).

* On Owen's comments:

On #1, there is not way (at the moment), based on the information being persisted by the JT, to know if the job has started execution or was pending at the time the JT died (was killed). That's way the proposal is to automatically resubmit jobs only if the flag of autorecovery is set to TRUE.

On #2, I'd say: if the input exists the job is relaunched, else it is discarded. About the output, refer to first comment (to Doug's).

On the "In particular, the framework should not be deleting ..." comment, the autorecovery is not a default behavior, but a explicitly configured one, thus when submitting a job with autorecovery on TRUE you are agreeing on output delete if necessary and re-launching if necessary.
 

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1121) Recovering running/scheduled jobs after JobTracker failure

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482302 ] 

Owen O'Malley commented on HADOOP-1121:
---------------------------------------

I would think that a better approach would be:
  1. any unstarted jobs are resubmitted automatically
  2. any started, but unfinished jobs should have their input and output formats checked for validity:
     a. if they are valid, the job is relaunched
     b. if either complains, the job is put in a "holding" queue to wait for human direction.

In particular, the framework should not be deleting output directories automatically or re-launching jobs that have no chance of succeeding.

> Recovering running/scheduled jobs after JobTracker failure
> ----------------------------------------------------------
>
>                 Key: HADOOP-1121
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1121
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>
> Currently all running/scheduled jobs are kept in memory in the JobTracker. If the JobTracker goes down all the running/scheduled jobs have to be resubmitted.
> Proposal:
> (1) On job submission the JobTracker would save the job configuration (job.xml) in a jobs DFS directory using the jobID as name.
> (2) On job completion (success, failure, klll) it would delete the job configuration from the jobs DFS directory.
> (3) On JobTracker failure the jobs DFS directory will have all running/scheduled jobs at failure time.
> (4) On startup the JobTracker would check the jobs DFS directory for job config files. if there is none it means no failure happened on last stop, there is nothing to be done. If there are job config files in the jobs DFS directory continue with the following recovery steps.
> (A) rename all job config files to $JOB_CONFIG_FILE.recover.
> (B) for each $JOB_CONFIG_FILE.recover: delete the output directory if it exists, schedule the job using the original job ID, delete the $JOB_CONFIG_FILE.recover (as a new $JOB_CONFIG_FILE will be there per scheduling (per step #1).
> (C) when B is completed start accepting new job submissions.
> Other details:
> A configuration flag would enable/disable the above behavior, if switched off (default behavior) nothing of the above happens.
> A startup flag could switch off job recovery for systems with the recover set to ON.
> Changes to the job ID generation should be put in place to avoid Job ID collision with jobs IDs from previous failed runs, for example appending a JT startup timestamp to the job IDs would do.
> Further improvements on top of this one:
> This mechanism would allow having a JobTracker node in standby to be started in case of main JobTracker failure. The standby JobTracker would be started on main JobTracker failure. Making things a little more comprehensive they backup JobTrackers could be running in warm mode and hearbeats and ping calls among them would activate a warm stand by JobTracker as new main JobTracker. Together with an enhancement in the JobClient (keeping a list of backup JobTracker URLs) would enable client fallback to backup JobTrackers.
> State about partially run jobs could be kept, tasks completed/in-progress/pending. This would enable to recover jobs half way instead restarting them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.