You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/02/08 00:25:13 UTC

[jira] [Resolved] (MESOS-245) Hadoop framework sometimes won't rerun failed map tasks

     [ https://issues.apache.org/jira/browse/MESOS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler resolved MESOS-245.
-----------------------------------

    Resolution: Won't Fix

We've recently completely re-written the hadoop scheduler / executor so this should no longer be an issue, can you confirm?
                
> Hadoop framework sometimes won't rerun failed map tasks
> -------------------------------------------------------
>
>                 Key: MESOS-245
>                 URL: https://issues.apache.org/jira/browse/MESOS-245
>             Project: Mesos
>          Issue Type: Bug
>          Components: framework
>            Reporter: Charles Reiss
>            Assignee: Charles Reiss
>
> There are two things which can occasionally cause the Mesos framework for Hadoop to fail to run map tasks:
> - it looks for runnable map tasks by examining lists which are not updated when a map task fails or is killed; when no non-failed/killed map tasks are runnable, it will never attempt to launch a new map task. (If any are runnable, it calls a normal Hadoop function to obtain the task, so it will account for the rerunning task that way.); and
> - if all available resources are used by reduce tasks and map outputs needed by those reduces become unusable, Hadoop will not be able to rerun the map task(s) because it will not receive any suitable offers. A workaround for this is to configure reduce-slots-per-machine limits such that the framework never saturates all the resources with reduce tasks. A better fix would be for the framework to detect the deadlock and kill a reduce task to resolve it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira