You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/04/21 16:56:15 UTC
[jira] [Comment Edited] (MAPREDUCE-5848) MapReduce counts forcibly preempted containers as FAILED

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975608#comment-13975608 ] 

Jason Lowe edited comment on MAPREDUCE-5848 at 4/21/14 2:54 PM:
----------------------------------------------------------------

bq. On the positive side, the AM should know the containers was on the short-list to be killed from previous preemption messages it received so maybe it could count a failure of a container "doomed" by preemption as a kill? Or simply postpone the decision on FAIL/KILL. Not sure...

Yes, the AM should definitely know, and I think the change in the patch is good just not sufficient.

As for postponing the decision, we may have to do just that.  To resolve the general case of SIGTERM potentially causing failures in the task which should be ignored in light of the kill, the AM may need to wait until it receives the container status from the RM to distinguish the cases.  Haven't thought through all of the ramifications of doing that, and I suspect there could be some long delays for some corner cases (e.g.: node fails as task fails, takes the RM a while to expire the node in order to send the container status).


was (Author: jlowe):
bq. On the positive side, the AM should know the containers was on the short-list to be killed from previous preemption messages it received
so maybe it could count a failure of a container "doomed" by preemption as a kill? Or simply postpone the decision on FAIL/KILL. Not sure...

Yes, the AM should definitely know, and I think the change in the patch is good just not sufficient.

As for postponing the decision, we may have to do just that.  To resolve the general case of SIGTERM potentially causing failures in the task which should be ignored in light of the kill, the AM may need to wait until it receives the container status from the RM to distinguish the cases.  Haven't thought through all of the ramifications of doing that, and I suspect there could be some long delays for some corner cases (e.g.: node fails as task fails, takes the RM a while to expire the node in order to send the container status).

> MapReduce counts forcibly preempted containers as FAILED
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-5848
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5848
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta
>            Reporter: Carlo Curino
>            Assignee: Subramaniam Krishnan
>         Attachments: YARN-1958.patch
>
>
> The MapReduce AM is considering a forcibly preempted container as FAILED, while I think it should be considered as KILLED (i.e., not count against the maximum number of failures). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)