You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/12/15 07:47:03 UTC

[jira] Assigned: (MAPREDUCE-2205) FairScheduler should not re-schedule jobs that have just been preempted

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma reassigned MAPREDUCE-2205:
--------------------------------------------

    Assignee: Scott Chen

Scott - all yours. 

one additional observation is that the fact that we continue rotating across different jobs causes us to get into situations that require preemptions in the first place. we should do our best not to schedule jobs that are above their fair/min share if there are jobs that are below those thresholds in the queue.

> FairScheduler should not re-schedule jobs that have just been preempted
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2205
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2205
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Joydeep Sen Sarma
>            Assignee: Scott Chen
>
> We have hit a problem with the preemption implementation in the FairScheduler where the following happens:
> # job X runs short of fair share or min share and requests/causes N tasks to be preempted
> # when slots are then scheduled - tasks from some other job are actually scheduled
> # after preemption_interval has passed, job X finds it's still underscheduled and requests preemption. goto 1.
> This has caused widespread preemption of tasks and the cluster going from high utilization to low utilization in a few minutes.
> After doing some analysis of the logs - one of the biggest contributing factors seems to be the scheduling of jobs when a heartbeat with multiple slots is advertised. currently it goes over all the jobs/pools (in sorted) order until all the slots are exhausted. this leads to lower priority jobs also getting scheduled (that may have just been preempted).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.