You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Bilwa S T (Jira)" <ji...@apache.org> on 2020/12/15 09:53:00 UTC

[jira] [Updated] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

     [ https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bilwa S T updated MAPREDUCE-7314:
---------------------------------
    Description: 
This is due to three different reasons
 # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
 # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill current attempt which is assigned to container. That is because task attempt is not updated in ContainerLauncherImpl#Container class. 
 # Container gets assigned to task attempt even when container has stopped running ie Container completed event is processed. This is because we add reuse container map to allocated list. Makeremoterequest gets the same container in allocationResponse whereas RM has sent same container in finished container list. To avoid this we need to make sure allocated list doesnt have any containers which are finished.

Test credits : [~Rajshree]

  was:
This is due to three different reasons
 # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
 # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill current attempt which is assigned to container. That is because task attempt is not updated in ContainerLauncherImpl#Container class. 
 # Container gets assigned to task attempt even when container has stopped running ie Container completed event is processed. This is because we add reuse container map to allocated list. Makeremoterequest gets the same container in allocationResponse whereas RM has sent same container in finished container list. To avoid this we need to make sure allocated list doesnt have any containers which are finished.


> Job will hang if NM is restarted while its running
> --------------------------------------------------
>
>                 Key: MAPREDUCE-7314
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Bilwa S T
>            Assignee: Bilwa S T
>            Priority: Major
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill current attempt which is assigned to container. That is because task attempt is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped running ie Container completed event is processed. This is because we add reuse container map to allocated list. Makeremoterequest gets the same container in allocationResponse whereas RM has sent same container in finished container list. To avoid this we need to make sure allocated list doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org