You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2012/12/05 17:30:59 UTC

[jira] [Updated] (MAPREDUCE-4850) Job recovery may fail if staging directory has been deleted

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4850:
---------------------------------

    Attachment: MAPREDUCE-4850.patch

A patch that deletes the staging directory after the system directory.

Manual testing showed that with this patch I couldn't get a recovery failure in the scenario in the description. It would be nice to add a unit test, but I'm still trying to figure out how to write one for this.

                
> Job recovery may fail if staging directory has been deleted
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-4850
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4850
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.1.1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4850.patch
>
>
> The job staging directory is deleted in the job cleanup task, which happens before the job-info file is deleted from the system directory (by the JobInProgress garbageCollect() method). If the JT shuts down between these two operations, then when the JT restarts and tries to recover the job, it fails since the job.xml and splits are no longer available.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira