You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/22 22:52:39 UTC

[jira] [Resolved] (MAPREDUCE-562) A single slow (but not dead) map TaskTracker impedes MapReduce progress

     [ https://issues.apache.org/jira/browse/MAPREDUCE-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved MAPREDUCE-562.
----------------------------------------

    Resolution: Incomplete

This is still an interesting issue, but at this point, I feel the need to close this one.  The big reason being that this problem needs to be generalized for YARN and made much less MR specific.


> A single slow (but not dead) map TaskTracker impedes MapReduce progress
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-562
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-562
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Aaron Kimball
>
> We see cases where there may be a large number of mapper nodes running many tasks (e.g., a thousand). The reducers will pull 980 of the map task intermediate files down, but will be unable to retrieve the final intermediate shards from the last node. The TaskTracker on that node returns data to reducers either slowly or not at all, but its heartbeat messages make it back to the JobTracker -- so the JobTracker doesn't mark the tasks as failed. Manually stopping the offending TaskTracker works to migrate the tasks to other nodes, where the shuffling process finishes very quickly. Left on its own, it can take hours to unjam itself otherwise.
> We need a mechanism for reducers to provide feedback to the JobTracker that one of the mapper nodes should be regarded as lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)