You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2012/12/05 17:16:58 UTC

[jira] [Created] (MAPREDUCE-4850) Job recovery may fail if staging directory has been deleted

Tom White created MAPREDUCE-4850:
------------------------------------

             Summary: Job recovery may fail if staging directory has been deleted
                 Key: MAPREDUCE-4850
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4850
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv1
    Affects Versions: 1.1.1
            Reporter: Tom White
            Assignee: Tom White


The job staging directory is deleted in the job cleanup task, which happens before the job-info file is deleted from the system directory (by the JobInProgress garbageCollect() method). If the JT shuts down between these two operations, then when the JT restarts and tries to recover the job, it fails since the job.xml and splits are no longer available.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4850) Job recovery may fail if staging directory has been deleted

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4850:
---------------------------------

    Attachment: MAPREDUCE-4850.patch

A patch that deletes the staging directory after the system directory.

Manual testing showed that with this patch I couldn't get a recovery failure in the scenario in the description. It would be nice to add a unit test, but I'm still trying to figure out how to write one for this.

                
> Job recovery may fail if staging directory has been deleted
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-4850
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4850
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.1.1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4850.patch
>
>
> The job staging directory is deleted in the job cleanup task, which happens before the job-info file is deleted from the system directory (by the JobInProgress garbageCollect() method). If the JT shuts down between these two operations, then when the JT restarts and tries to recover the job, it fails since the job.xml and splits are no longer available.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira