You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/12/14 00:14:03 UTC
[jira] Updated: (MAPREDUCE-2205) FairScheduler should not re-schedule jobs that have just been preempted

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated MAPREDUCE-2205:
-----------------------------------------

    Description: 
We have hit a problem with the preemption implementation in the FairScheduler where the following happens:

# job X runs short of fair share or min share and requests/causes N tasks to be preempted
# when slots are then scheduled - tasks from some other job are actually scheduled
# after preemption_interval has passed, job X finds it's still underscheduled and requests preemption. goto 1.

This has caused widespread preemption of tasks and the cluster going from high utilization to low utilization in a few minutes.

After doing some analysis of the logs - one of the biggest contributing factors seems to be the scheduling of jobs when a heartbeat with multiple slots is advertised. currently it goes over all the jobs/pools (in sorted) order until all the slots are exhausted. this leads to lower priority jobs also getting scheduled (that may have just been preempted).

  was:
We have hit a problem with the preemption implementation in the FairScheduler where the following happens:

# job X runs short of fair share or min share and requests/causes N tasks to be preempted
# when slots are then scheduled - tasks from some other job are actually scheduled
# after preemption_interval has passed, job X finds it's still underscheduled and requests preemption. goto 1.

This has caused widespread preemption of tasks and the cluster going from high utilization to low utilization in a few minutes.

Some of the problems are specific to our internal version of hadoop (still 0.20 and doesn't have the hierarchical FairScheduler) - but i think the issue here is generic (just took a look at the trunk assignTasks and tasksToPreempt routines). The basic problem seems to be that the logic of assignTasks+FairShareComparator is not consistent with the logic in tasksToPreempt(). The latter can choose to preempt tasks on behalf of jobs that may not be first up for scheduling based on the FairComparator. Understanding whether these two separate pieces of logic are consistent and keeping it that way is difficult.

It seems that a much safer preemption implementation is to walk the jobs in the order they would be scheduled on the next heartbeat - and only preempt for jobs that are at the head of this sorted queue. In MAPREDUCE-2048 - we have already introduced a pre-sorted list of jobs ordered by current scheduling priority. It seems much easier to preempt only jobs at the head of this sorted list.

        Summary: FairScheduler should not re-schedule jobs that have just been preempted  (was: FairScheduler should only preempt tasks for pools/jobs that are up next for scheduling)

rephrasing. the ordering of the jobs in faircomparator seems consistent with the logic that figures out what to preempt (contrary to my initial intuition).

> FairScheduler should not re-schedule jobs that have just been preempted
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2205
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2205
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Joydeep Sen Sarma
>
> We have hit a problem with the preemption implementation in the FairScheduler where the following happens:
> # job X runs short of fair share or min share and requests/causes N tasks to be preempted
> # when slots are then scheduled - tasks from some other job are actually scheduled
> # after preemption_interval has passed, job X finds it's still underscheduled and requests preemption. goto 1.
> This has caused widespread preemption of tasks and the cluster going from high utilization to low utilization in a few minutes.
> After doing some analysis of the logs - one of the biggest contributing factors seems to be the scheduling of jobs when a heartbeat with multiple slots is advertised. currently it goes over all the jobs/pools (in sorted) order until all the slots are exhausted. this leads to lower priority jobs also getting scheduled (that may have just been preempted).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.