You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2009/05/21 18:31:45 UTC

[jira] Created: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Capacity scheduler should account high memory jobs as using more capacity of the queue
--------------------------------------------------------------------------------------

                 Key: HADOOP-5884
                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/capacity-sched
            Reporter: Hemanth Yamijala


Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718614#action_12718614 ] 

Hudson commented on HADOOP-5884:
--------------------------------

Integrated in Hadoop-trunk #863 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/])
    

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>             Fix For: 0.20.1
>
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt, HADOOP-5884-20090605-branch-20.txt, HADOOP-5884.patch
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod K V updated HADOOP-5884:
------------------------------

    Attachment: HADOOP-5884-20090602.1.txt

Updated patch incorporating all the above review comments except one:
 - Removed running tasks information from the UI. As of now, we are trying to avoid absolute numbers because of possible inconsistency between scheduler's information and cluster status. And, specifying running tasks as a percentage of total cluster capacity doesn't make sense now with each task possibly occupying multiple slots. The correct fix is to print absolute numbers after removing any inconsisteny possible. Hence pushing this to another follow-up jira issue.

@Arun
bq. Can we also add the number of slots to the UI?
I didn't get this. Do you mean number of slots per job being displayed in job-scheduling information? We are already displaying the number of slots used by a queue as percentage.

If you meant the first, I already considered this, but let it go for another jira. The job scheduling information is being displayed on the jobtracker ui first page and it looked ugly when it spanned multiple lines. I think it would be good if we can remove job scheduling information from the first page. But as that might trigger discussion, I've decided to leave it for now.

bq.Long term - we really should fix TestCapacityScheduler to not check strings and use relevant apis (even package-private ones).
Agree, even I could realize the pain while modifying testcases, but decide to postpone it for another jira as it is slightly tricky.


> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715275#action_12715275 ] 

Arun C Murthy commented on HADOOP-5884:
---------------------------------------

Can we also add the number of slots to the UI?

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-5884:
-------------------------------------

    Release Note: 
Fixes Capacity scheduler to account more capacity of a queue for a high memory job. Done by considering these jobs to
take more slots proportionally with respect to a slot's default memory size.


> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>             Fix For: 0.20.1
>
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt, HADOOP-5884-20090605-branch-20.txt, HADOOP-5884.patch
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715281#action_12715281 ] 

Arun C Murthy commented on HADOOP-5884:
---------------------------------------

Long term - we really should fix TestCapacityScheduler to not check strings and use relevant apis (even package-private ones).

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod K V reassigned HADOOP-5884:
---------------------------------

    Assignee: Vinod K V

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod K V updated HADOOP-5884:
------------------------------

    Attachment: HADOOP-5884-20090603.txt

Latest patch with Arun's comments incorporated, bringing back the absolute counts for running tasks, occupied capacities. Also added job scheduling information to the jobdetails.jsp page.

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715384#action_12715384 ] 

Hemanth Yamijala commented on HADOOP-5884:
------------------------------------------

Some comments on test cases:

- testClusterBlockingForLackOfMemory needs updates to validate the number of slots
- I think we may need two additional tests:
-- We should have a test to check the change in sorting of queues done based on slots than running tasks. We could do this by submitting 2 jobs to 2 queues, 1 is normal and the other is high RAM. We can check that for every assignTasks call, if 1 task of the high RAM job is scheduled, two tasks of the normal job are (assuming 2 slots for a task of the high RAM Jobs). 
-- We should have a check on user limits that a high RAM job hits it's user limits twice as fast as a normal job, again assuming 2 slots for a task of the high RAM job.

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod K V updated HADOOP-5884:
------------------------------

    Attachment: HADOOP-5884-20090529.1.txt

The proposal is to track capacities and user-limits by the number of slots occupied by the tasks of a job instead of the number of running tasks.

Attaching patch implementing this. This patch has to be applied over the latest patch for HADOOP-5932. This patch does the following:
 - Modifies all the calculations of capacities and user-limits to be based on the number of slots occupied by running tasks of a job.
 - Retains number of running tasks for displaying on the UI
 - Adds test-cases to verify the number of slots accounted for high memory jobs by modifying the corresponding tests.
 - Adds test-cases to verify the newly added "occupied slots" in the scheduling information
 - Adds missing @override tags, removes stale imports and stale occurrences of gc (guarenteed capacity)


> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod K V updated HADOOP-5884:
------------------------------

    Attachment: HADOOP-5884-20090605-branch-20.txt

Patch for branch-0.20

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt, HADOOP-5884-20090605-branch-20.txt, HADOOP-5884.patch
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod K V updated HADOOP-5884:
------------------------------

    Status: Patch Available  (was: Open)

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-5884:
-------------------------------------

    Attachment: HADOOP-5884.patch

A slightly modified patch. Basically just makes the comments and debug statements in test cases match code. The list of changes made are the following:

- Added a comment on the getOrderedQueues method.
- In testUserLimitsForHighMemoryJobs - set max reduce slots set to 2G instead of 1, as we are submitting jobs with 2G reduces.
- Same test, JobConf was being overwritten. I changed that.
- Also, Debug statement not matching the submitted high RAM job. (were saying 0MB reduces) Changed that.
- Also corrected debug statements in testQueueOrdering

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt, HADOOP-5884.patch
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-5884:
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.1
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch 0.20. Thanks, Vinod !

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>             Fix For: 0.20.1
>
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt, HADOOP-5884-20090605-branch-20.txt, HADOOP-5884.patch
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716285#action_12716285 ] 

Hemanth Yamijala commented on HADOOP-5884:
------------------------------------------

Vinod, can you please upload a patch for Hadoop 0.20, so I can commit it to the 0.20 branch as well ? I will commit both together.

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt, HADOOP-5884.patch
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715131#action_12715131 ] 

Hemanth Yamijala commented on HADOOP-5884:
------------------------------------------

Some comments:

- TaskSchedulingInfo.toString() - displaying the actual value had some problem in terms of exactness and mismatch between cluster info and the state we kept. That's why we shifted to percentages. May be a good idea to retain the model. Same argument can be made for running tasks and numSlotsOccupiedByThisUser
- "Occupied slots" seems too techie. Call it 'Used capacity' ? Likewise instead of '% of total slots occupied by all users', call it '% of used capacity' ?
- TaskSchedulingMgr.isUserOverLimit() - we add 1 if we're using more than the queue capacity. It could be more than 1, depending on the task we are assigning (if it's part of high RAM job)
- MapSchedulingMgr constructor: typo: schedulr - should be scheduler. Similar for Reduce...
- Minor NIT: Use format instead of the complicated StringBuffer.append()... kind of code. Makes it really hard to find what's happening.
- updateQSIObjects. The log statement is printing numMapSlotsForThisJob instead of numMapsRunningForThisJob.

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715857#action_12715857 ] 

Arun C Murthy commented on HADOOP-5884:
---------------------------------------

I'm just proposing we add #slots (along with already available #running_tasks) to both per-queue info and per-job info (jobdetails.jsp) so that it's clear to users that the queue isn't being under-served (since #running_tasks might be lesser than #slots_taken).

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5884) Capacity scheduler should account high memory jobs as using more capacity of the queue

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716284#action_12716284 ] 

Hemanth Yamijala commented on HADOOP-5884:
------------------------------------------

Results of test-patch:

     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 4 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]

All capacity scheduler tests pass, except TestQueueCapacities which is being tracked elsewhere.

> Capacity scheduler should account high memory jobs as using more capacity of the queue
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5884
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>         Attachments: HADOOP-5884-20090529.1.txt, HADOOP-5884-20090602.1.txt, HADOOP-5884-20090603.txt, HADOOP-5884.patch
>
>
> Currently, when a high memory job is scheduled by the capacity scheduler, each task scheduled counts only once in the capacity of the queue, though it may actually be preventing other jobs from using spare slots on that node because of its higher memory requirements. In order to be fair, the capacity scheduler should proportionally (with respect to default memory) account high memory jobs as using a larger capacity of the queue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.