You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2009/06/01 17:46:07 UTC

[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715131#action_12715131 ] 

Hemanth Yamijala commented on HADOOP-5884:
------------------------------------------

Some comments:

- TaskSchedulingInfo.toString() - displaying the actual value had some problem in terms of exactness and mismatch between cluster info and the state we kept. That's why we shifted to percentages. May be a good idea to retain the model. Same argument can be made for running tasks and numSlotsOccupiedByThisUser
- "Occupied slots" seems too techie. Call it 'Used capacity' ? Likewise instead of '% of total slots occupied by all users', call it '% of used capacity' ?
- TaskSchedulingMgr.isUserOverLimit() - we add 1 if we're using more than the queue capacity. It could be more than 1, depending on the task we are assigning (if it's part of high RAM job)
- MapSchedulingMgr constructor: typo: schedulr - should be scheduler. Similar for Reduce...
- Minor NIT: Use format instead of the complicated StringBuffer.append()... kind of code. Makes it really hard to find what's happening.
- updateQSIObjects. The log statement is printing numMapSlotsForThisJob instead of numMapsRunningForThisJob.

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.