You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2008/11/10 15:59:44 UTC

[jira] Commented: (HADOOP-4471) Capacity Scheduler should maintain the right ordering of jobs in its running queue

    [ https://issues.apache.org/jira/browse/HADOOP-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646274#action_12646274 ] 

Hemanth Yamijala commented on HADOOP-4471:
------------------------------------------

I am documenting a few more discussions that Vivek, Owen and I had.

It is worthwhile to note that there is another problem with maintaining running jobs sorted by priorities. That is the problem of temporary disk space usage. 

For e.g. consider a low priority job that has started running. The maps run for this job will use disk space for storing the intermediate outputs. At this point, if a higher priority job is submitted and it starts running, the space used for the low priority job would be held up until it completes. 

This situation is not new, and exists even with the default scheduler. However, because the capacity scheduler runs multiple jobs concurrently (from multiple queues, or from different users), the problem is slightly more serious in this case.

That said, it is still not clear what a right way of fixing this problem is. At the same time, not sorting running jobs still makes it extremely difficult for users to run high priority jobs in preference to lower priority ones if the need arises. Hence, while the problems with sorting running jobs are acknowledged, we may still want to do this and address the issues in related jiras like HADOOP-4557.

> Capacity Scheduler should maintain the right ordering of jobs in its running queue
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4471
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4471
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Vivek Ratan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Blocker
>             Fix For: 0.19.1
>
>         Attachments: HADOOP-4471-v1.patch
>
>
> Currently, the Capacity Scheduler maintains a simple linked list of jobs which are running. This implies that running jobs are sorted by when they started running (i.e., when they were added to the queue). The Scheduler should maintain the same ordering among running jobs that it does for waiting jobs. Jobs should be sorted by priority (if the queue supports priorities) and by their submit time. 
> This sorting would be more fair in deciding which running jobs get access to a free TT. It also does not penalize jobs that have a longer setup task, which affects when they enter the run queue. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.