You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Karam Singh (JIRA)" <ji...@apache.org> on 2008/09/19 06:52:44 UTC

[jira] Created: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.

Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
-----------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-4211
                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/capacity-sched
    Affects Versions: 0.19.0
         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit =25% and only one queue.


            Reporter: Karam Singh


Capacity Scheduler does not divide queue resources  properly among users, when job are submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4211:
-------------------------------------

         Priority: Blocker  (was: Major)
    Fix Version/s: 0.19.0
         Assignee: Hemanth Yamijala

> Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit =25% and only one queue.
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Capacity Scheduler does not divide queue resources  properly among users, when job are submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.

Posted by "Karam Singh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640077#action_12640077 ] 

Karam Singh commented on HADOOP-4211:
-------------------------------------

Checked and verified that after applying patch from HADOOP-4053 the queue resources are divided properly among users, when jobs are submitted one after other. 

To verify third, did their following -:
Submitted 4 jobs one after the other (user-limit =25%) with different users. Slots are divided among job equally. 
After first 4 jobs finished submitted 2 new jobs with different users one after the other, slots were divided among both jobs (50% each).

 Repeated similar type of test about 10 times, and the behavior was same



> Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit =25% and only one queue.
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Capacity Scheduler does not divide queue resources  properly among users, when job are submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633385#action_12633385 ] 

Owen O'Malley commented on HADOOP-4211:
---------------------------------------

There isn't preemption between users in the same queue. Did you observe J1 getting more map slots after J2 was submitted, but before J2 reached 50%? And similarly for the second case?

> Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit =25% and only one queue.
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Capacity Scheduler does not divide queue resources  properly among users, when job are submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637146#action_12637146 ] 

Vivek Ratan commented on HADOOP-4211:
-------------------------------------

Yes, part of this behavior is explained by HADOOP-4053. If none of the first three jobs is removed, and two additional jobs are submitted by two different users, then we have 5 users in the system, and each user gets 25% of the resources. When a slot is free, it is given to Job 4 (since Jobs 1, 2, and 3 don't have any tasks to run as they have completed). Slots are given to job4 till that job/user consumes n/4 slots. Then they're given to job5, up until n/4 slots are consumed by job5 too. With the fix for HADOOP-4053, the limits for job4 and job5 will be n/2, which is right. If either job4 or job5 does not have enough tasks to run at limit, additional slots are given to jobs that do have a need, even though they may be running at limit. 

I agree with you - you should re-evaluate this behavior once HADOOP-4053 is fixed. A lot depends on when jobs are marked complete and removed from the scheduler, as that determines the current user limit. 

> Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit =25% and only one queue.
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Capacity Scheduler does not divide queue resources  properly among users, when job are submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala resolved HADOOP-4211.
--------------------------------------

    Resolution: Duplicate

Based on Karam's testing, which confirms the diagnosis, I am marking this bug a duplicate of HADOOP-4053.

> Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit =25% and only one queue.
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Capacity Scheduler does not divide queue resources  properly among users, when job are submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634403#action_12634403 ] 

Hemanth Yamijala commented on HADOOP-4211:
------------------------------------------

The scenario in which this problem occurs is slightly different from what is described. The actual scenario is as follows:
- Suppose we have n slots in the system, and 25% is the minimum user limit.
- When we submit 3 jobs as 3 different users one after the other, in steady state, each user gets n/3 slots.
- Let these 3 jobs complete.
- Now, submit 2 more jobs as 2 different users.
- The expectation is that the users get n/2 slots in steady state. However, the first user gets 2n/3 slots and the other user gets n/3 slots.

The reason for this behavior is directly related to HADOOP-4053. Currently, there is no notification to the schedulers that a job has completed. 

In the {{CapacityTaskScheduler}}, the limit is computed as follows:
{code}
limit = Math.max((int)(Math.ceil((double)currentCapacity/
          (double)qsi.numJobsByUser.size())), 
          (int)(Math.ceil((double)(qsi.ulMin*currentCapacity)/100.0)));
{code}

A user is added to the map {{numJobsByUser}} when a job is added. The intent was that the user is removed from this map upon job completion. However, since this event is not yet raised, the number of users is not correctly updated. As a result, the limit is still computed as n/3, instead of n/2. And currently, if all users have hit the limit, then the first user with running jobs is given any remaining slots, explaining the behavior observed.

In summary, if HADOOP-4053 is fixed, this issue will automatically get fixed. In fact, I applied the patch currently available on HADOOP-4053 and verified the behavior is correct now. That is, the limit is recomputed correctly.

I discussed this with Karam, and we agree that the observations are correct. I'll mark HADOOP-4053 a blocker for this bug. When that gets committed, Karam can try out again and close this bug.

> Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit =25% and only one queue.
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Capacity Scheduler does not divide queue resources  properly among users, when job are submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4211) Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.

Posted by "Karam Singh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632510#action_12632510 ] 

Karam Singh commented on HADOOP-4211:
-------------------------------------

Assumption: User are submitting the jobs one after another and not at the same time.

Here is test scenario:
User1 submits job J1 with 612 maps and 408 reduces.
When J1 starts running it uses all resources i.e 204 maps and 204 reducers, user2 submits job J2 with 204 maps and 204 reduces.
J2 will be queued until  running maps of  J1 starts getting completed.

The Observation is -:
When both J1 and J2 are running -:
J1 is running 153 maps equal to 75% of resources
J2 is running 51 maps equal to 25% of resources.

Similarly User3 submit J3 then:
J1 runs 102 maps equal to 50% of resources
J2 runs 51 maps equal to 25% of resources
J3 runs 51 maps equal to 25% of resources

According to documentation -: When there two users running jobs then resources will be shared 50% by both users.
And Similarly When there are three users running jobs then resources will be shared 33% by users.

Note -: If all users submit jobs at the same time, the resources are shared equally 50% (each for 2 user jobs) and 33% (each for 3 user jobs). 



> Capacity Scheduler does not divide queue resources properly among users, when jobs are submitted one after other.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4211
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Mapred Cluster capacity with 204 Maps and 204 Reduces. User limit =25% and only one queue.
>            Reporter: Karam Singh
>
> Capacity Scheduler does not divide queue resources  properly among users, when job are submitted one after other. E.g. user limit =25. Say User1's job is running. Then user2 submits a job. Then user1's job uses 75% and user2's job 25%=user limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.