You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org> on 2012/04/06 23:58:16 UTC

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Attachment: MAPREDUCE-4099.patch

Patch to add new FINISHING state to app/appattempt.  The RM will now wait for the AM liveness expiry interval after the AM unregisters for the AM to exit cleanly.  During this period the app is in the FINISHING state.  Once the AM exits or it expires, the app is moved to the FINISHED state where any containers are cleaned as usual.  This allows the AM time to perform final cleanup tasks like removing the staging directory.

Patch is based on trunk.  Looks like we'll need a new patch for the 0.23 branch.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira