You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2013/03/20 19:29:15 UTC

[jira] [Commented] (MAPREDUCE-5086) MR app master deletes staging dir when sent a reboot command from the RM

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607962#comment-13607962 ] 

Bikas Saha commented on MAPREDUCE-5086:
---------------------------------------

Lets add a log when we are changing the value of isLastRetry. Also, the "Notify JHEH" log should go into notifyIsLastAMRetry() as it describes the specific actions happening inside that method.
Also, it will be useful to add a comment before the shutdownJob() methods explaining that we are passing in the job object so that it can be overridden for tests.
{code}
-      //We are finishing cleanly so this is the last retry
-      isLastAMRetry = true;
+      //if isLastAMRetry comes as true, should never set it to false
+      if ( !isLastAMRetry){
+        if ( jobImpl.getInternalState() != JobStateInternal.REBOOT) {
+          //We are finishing cleanly so this is the last retry
+          isLastAMRetry = true;
+        }
+      }
+     // Notify the JHEH and RMCommunicator whether this is lastAMRetry
+      LOG.info("Notify JHEH and RMCommunicator isAMLastRetry: " + isLastAMRetry);
+      notifyIsLastAMRetry(isLastAMRetry);
       // Stop all services
{code}

There are random spurious newlines/spaces in the patch that need to be removed.
{code}
   protected UserGroupInformation currentUser; // Will be setup during init
-
+  
   private volatile boolean isLastAMRetry = false;
................
-              INTERNAL_ERROR_TRANSITION)
+              INTERNAL_ERROR_TRANSITION) 
{code}

You mean "job" and not "jb" right?
{code}
+  private static class InternalTerminationTransition implements
       SingleArcTransition<JobImpl, JobEvent> {
+    JobStateInternal terminationState = null;
+    String jbHistoryString = null;
{code}

Is this required in all the tests? 
{code}
+     jobid.setAppId(appId);
+     ContainerAllocator mockAlloc = mock(ContainerAllocator.class);
{code}
                
> MR app master deletes staging dir when sent a reboot command from the RM
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5086
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5086
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: jian he
>            Assignee: jian he
>         Attachments: YARN-472.1.patch, YARN-472.2.patch
>
>
> If the RM is restarted when the MR job is running, then it sends a reboot command to the job. The job ends up deleting the staging dir and that causes the next attempt to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira