You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Bilwa S T (Jira)" <ji...@apache.org> on 2020/12/15 09:52:00 UTC

[jira] [Created] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

Bilwa S T created MAPREDUCE-7314:
------------------------------------

             Summary: Job will hang if NM is restarted while its running
                 Key: MAPREDUCE-7314
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
             Project: Hadoop Map/Reduce
          Issue Type: Sub-task
            Reporter: Bilwa S T
            Assignee: Bilwa S T


This is due to three different reasons
 # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
 # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill current attempt which is assigned to container. That is because task attempt is not updated in ContainerLauncherImpl#Container class. 
 # Container gets assigned to task attempt even when container has stopped running ie Container completed event is processed. This is because we add reuse container map to allocated list. Makeremoterequest gets the same container in allocationResponse whereas RM has sent same container in finished container list. To avoid this we need to make sure allocated list doesnt have any containers which are finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org