You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (Created) (JIRA)" <ji...@apache.org> on 2012/04/03 16:02:24 UTC

[jira] [Created] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

ApplicationMaster may fail to remove staging directory
------------------------------------------------------

                 Key: MAPREDUCE-4099
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 0.23.2
            Reporter: Jason Lowe
            Priority: Critical


When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.

We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Attachment: MAPREDUCE-4099-addendum.patch

Great catch, Sid!  Apologies for missing the race condition, I forgot that the history server flush was performed by the stop().  The 5 second sleep in the AM before it calls stop() hid this issue during my manual testing.

The addendum patch moves the staging cleanup into a service that is registered after the RM container allocator service but before the job history event handler.  This will allow the job history to be flushed and moved to done intermediate before the staging directory is removed, and the staging directory removal will still occur before unregistering with the RM.

The patch also moves the test case to a more appropriate location.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251567#comment-13251567 ] 

Hadoop QA commented on MAPREDUCE-4099:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522240/MAPREDUCE-4099-addendum.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified test files.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2197//console

This message is automatically generated.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252398#comment-13252398 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #1012 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1012/])
    MAPREDUCE-4099 amendment. ApplicationMaster will remove staging directory after the history service is stopped. (Contributed by Jason Lowe) (Revision 1324866)

     Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324866
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Attachment: MAPREDUCE-4099.patch

Patch that simply moves the cleanupStagingDir() call to before the AM stops services (and therefore notifies the RM).
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Open  (was: Patch Available)
    
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Patch Available  (was: Open)

Didn't see any actual failures in the Jenkins build, so maybe it was a timeout?  Tried running the jobclient tests on trunk with and without this patch, and I didn't see any noticeable time difference.  Kicking Jenkins again.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251064#comment-13251064 ] 

Siddharth Seth commented on MAPREDUCE-4099:
-------------------------------------------

Bobby, Jason - The cleanup shouldn't be called before service stop - may end up removing history files before they're moved over. It needs to be after history.stop and before RMCommunicator.stop.

The current patch will work in most cases - because of the 5 second sleep, but can go wrong.

Also, the original patch may be quite useful (separate jira) - it makes writing an AM a little easier.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Patch Available  (was: Reopened)
    
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251746#comment-13251746 ] 

Siddharth Seth commented on MAPREDUCE-4099:
-------------------------------------------

+1. The update looks good. Thanks Jason.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251084#comment-13251084 ] 

Robert Joseph Evans commented on MAPREDUCE-4099:
------------------------------------------------

I am very sorry about that.  I guess I missed it.  Do you want me to revert the change and wait for a new patch?
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Attachment: MAPREDUCE-4099.patch

Apparently I need to submit a new patch to convince Jenkins to run again.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245751#comment-13245751 ] 

Jason Lowe commented on MAPREDUCE-4099:
---------------------------------------

Initially I thought a quick fix would be to change MRAppMaster so the call to cleanupStagingDir() occurs before stopping all the services (and therefore before the RM tries to kill the AM).  However this introduces another problem: if something goes wrong with the AM (killed, crashed, hung) between the time it has removed the staging directory and before it has notified the RM then the RM will think the AM did not complete successfully and it will either report the job as failed (after the AM already told the client it was successful) or worse, the RM will launch another AM attempt and fail because the staging directory has been removed.

Seems like we need another RMApp state to track the fact that we've succeeded but are still cleaning up.  For example, when the AM unregisters, we go from RUNNING into a new FINISHING state.  The RM then gives the AM so many seconds to exit on its own.  If the AM container doesn't exit on its own within the time limit then the RM kills the container, but in either case we move to the FINISHED state (i.e.: once we're FINISHING, we're going to get to FINISHED one way or another).

I'm not thrilled with the idea of adding yet another state to app/attempts, but other alternatives seem to open doors to the AM failing at just the wrong time and we end up failing the job after the AM has already told the client the job was successful.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Priority: Critical
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Siddharth Seth (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Seth updated MAPREDUCE-4099:
--------------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2 and branch-0.23
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250093#comment-13250093 ] 

Hadoop QA commented on MAPREDUCE-4099:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521985/MAPREDUCE-4099.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified test files.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these unit tests:
                  org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
                  org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry
                  org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
                  org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2177//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2177//console

This message is automatically generated.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Attachment: MAPREDUCE-4099.patch

Patch to add new FINISHING state to app/appattempt.  The RM will now wait for the AM liveness expiry interval after the AM unregisters for the AM to exit cleanly.  During this period the app is in the FINISHING state.  Once the AM exits or it expires, the app is moved to the FINISHED state where any containers are cleaned as usual.  This allows the AM time to perform final cleanup tasks like removing the staging directory.

Patch is based on trunk.  Looks like we'll need a new patch for the 0.23 branch.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250151#comment-13250151 ] 

Jason Lowe commented on MAPREDUCE-4099:
---------------------------------------

All of the reported test failures appear to be unrelated to the patch.  They all fail because a ResourceManager process can't start due to a socket bind problem -- a runaway RM process on the build machine, perhaps?  I ran the RM unit tests locally with this patch and they all pass.

I also manually tested the patch with a single-node cluster running sleep and wordcount jobs.  Also connected the debugger to the ApplicationMaster, causing it to linger artificially in the FINISHING state to verify killing or expiring an application in the FINISHING state behaves properly.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252445#comment-13252445 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1047/])
    MAPREDUCE-4099 amendment. ApplicationMaster will remove staging directory after the history service is stopped. (Contributed by Jason Lowe) (Revision 1324866)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324866
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Open  (was: Patch Available)
    
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251652#comment-13251652 ] 

Jason Lowe commented on MAPREDUCE-4099:
---------------------------------------

TestClientRMService failure appears to be unrelated to this patch.  It's testing an area of the code unrelated to the changes, and the test passes for me when I run it locally.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Patch Available  (was: Open)
    
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250387#comment-13250387 ] 

Siddharth Seth commented on MAPREDUCE-4099:
-------------------------------------------

bq. Initially I thought a quick fix would be to change MRAppMaster so the call to cleanupStagingDir() occurs before stopping all the services (and therefore before the RM tries to kill the AM). However this introduces another problem: if something goes wrong with the AM (killed, crashed, hung) between the time it has removed the staging directory and before it has notified the RM then the RM will think the AM did not complete successfully and it will either report the job as failed (after the AM already told the client it was successful) or worse, the RM will launch another AM attempt and fail because the staging directory has been removed.

A similar situation would still exist. During service shutdown, all exceptions are ignored (logged) - and an attempt is made to shutdown all subsequent services. After service stop - the AM would proceed and delete the staging directory.
A really bad case is if there was an error talking to the RM during the unregister. The client has already been told that the job is successful - the RM would have no idea, and the AM would eventually delete the staging directory and exit.

Deleting the staging directory just before the RMCommunicator is stopped would be a much simpler change. It would however have the same problem in case of a failed unregister. Subsequent services do not matter.

The HistoryEventHandler had a similar race - where the AM was being shutdown before the history handler was stopped. That was fixed by ensuring the history service was registered after the container allocator - so that it shuts down first.

Handling the situation where the client thinks the job is successful, and the RM has no idea about the job seems like a separate Jira. Some way to have the RM pick up the state of the job if and when it restarts the next AM.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251755#comment-13251755 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #2053 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2053/])
    MAPREDUCE-4099 amendment. ApplicationMaster will remove staging directory after the history service is stopped. (Contributed by Jason Lowe) (Revision 1324866)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324866
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Open  (was: Patch Available)

Agreed, we can still get into a bad situation if the AM cannot communicate with the RM after informing the client.  It is a much simpler change to move up the staging directory cleanup to before we contact the RM for shutdown, so that's what I'll do.  I still think it's a bit odd to be proactively killing a well-behaved AM, but perhaps that's another JIRA.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250844#comment-13250844 ] 

Hadoop QA commented on MAPREDUCE-4099:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522123/MAPREDUCE-4099.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 1 new or modified test files.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2185//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2185//console

This message is automatically generated.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4099:
-------------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.0.0
                   0.23.3
           Status: Resolved  (was: Patch Available)

Thanks Jason,  I put this into trunk, branch-2, and branch-0.23
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251793#comment-13251793 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #2066 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2066/])
    MAPREDUCE-4099 amendment. ApplicationMaster will remove staging directory after the history service is stopped. (Contributed by Jason Lowe) (Revision 1324866)

     Result = ABORTED
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324866
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Patch Available  (was: Open)
    
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249032#comment-13249032 ] 

Hadoop QA commented on MAPREDUCE-4099:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521769/MAPREDUCE-4099.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified test files.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed the unit tests build

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2168//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2168//console

This message is automatically generated.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Open  (was: Patch Available)
    
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250955#comment-13250955 ] 

Robert Joseph Evans commented on MAPREDUCE-4099:
------------------------------------------------

I like the patch.  +1 

I just have one small comment.  We are not logging the IOException that caused the staging dir to not be deleted, I can add it in when I check it in though.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251510#comment-13251510 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #224 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/224/])
    svn merge -c 1311926 from trunk.  FIXES: MAPREDUCE-4099. ApplicationMaster may fail to remove staging directory (Jason Lowe via bobby) (Revision 1311930)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1311930
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250975#comment-13250975 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #2054 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2054/])
    MAPREDUCE-4099. ApplicationMaster may fail to remove staging directory (Jason Lowe via bobby) (Revision 1311926)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1311926
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Status: Patch Available  (was: Open)
    
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251639#comment-13251639 ] 

Hadoop QA commented on MAPREDUCE-4099:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522250/MAPREDUCE-4099-addendum.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified test files.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these unit tests:
                  org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2198//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2198//console

This message is automatically generated.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252385#comment-13252385 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #225 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/225/])
    merge MAPREDUCE-4099 amendment from trunk. ApplicationMaster will remove staging directory after the history service is stopped. (Contributed by Jason Lowe) (Revision 1324869)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324869
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

            Assignee: Jason Lowe
    Target Version/s: 0.23.3
              Status: Patch Available  (was: Open)
    
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Siddharth Seth (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Seth reopened MAPREDUCE-4099:
---------------------------------------


Things will continue to work because of the sleep - so leaving it in should be ok. Re-opening this jira.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251754#comment-13251754 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #2127 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2127/])
    MAPREDUCE-4099 amendment. ApplicationMaster will remove staging directory after the history service is stopped. (Contributed by Jason Lowe) (Revision 1324866)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324866
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250962#comment-13250962 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #2115 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2115/])
    MAPREDUCE-4099. ApplicationMaster may fail to remove staging directory (Jason Lowe via bobby) (Revision 1311926)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1311926
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Jason Lowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4099:
----------------------------------

    Attachment: MAPREDUCE-4099-addendum.patch

Accidentally based the addendum patch on my previous changes which was missing Bobby's tweak to it when it was committed.
                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251601#comment-13251601 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1046 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1046/])
    MAPREDUCE-4099. ApplicationMaster may fail to remove staging directory (Jason Lowe via bobby) (Revision 1311926)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1311926
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099-addendum.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250963#comment-13250963 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #2041 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2041/])
    MAPREDUCE-4099. ApplicationMaster may fail to remove staging directory (Jason Lowe via bobby) (Revision 1311926)

     Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1311926
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4099) ApplicationMaster may fail to remove staging directory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251550#comment-13251550 ] 

Hudson commented on MAPREDUCE-4099:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #1011 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1011/])
    MAPREDUCE-4099. ApplicationMaster may fail to remove staging directory (Jason Lowe via bobby) (Revision 1311926)

     Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1311926
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java

                
> ApplicationMaster may fail to remove staging directory
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4099
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4099
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.0
>
>         Attachments: MAPREDUCE-4099.patch, MAPREDUCE-4099.patch, MAPREDUCE-4099.patch
>
>
> When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory.  However upon hearing the AM has finished, the RM turns right around and kills the AM container.  If the AM is too slow, the AM will be killed before the staging directory is removed.
> We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira