You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2012/04/23 19:38:34 UTC

[jira] [Created] (MAPREDUCE-4191) capacity scheduler: job unexpectedly exceeds queue capacity limit by one task

Thomas Graves created MAPREDUCE-4191:
----------------------------------------

             Summary: capacity scheduler: job unexpectedly exceeds queue capacity limit by one task
                 Key: MAPREDUCE-4191
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4191
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2, scheduler
    Affects Versions: 0.23.3
            Reporter: Thomas Graves


While testing the queue capacity limits, it appears that the job can exceed the
queue capacity limit by one task while the user limit factor is 1. It's not
clear to me why this is. 

Here is the steps to reproduce:

1) set yarn.app.mapreduce.am.resource.mb to 2048 (default value)
2) set yarn.scheduler.capacity.root.default.user-limit-factor to 1.0 (default)
3) set yarn.scheduler.capacity.root.default.capacity to 90 (%)
4) For a cluster with capacity of 56G, 90% rounded up is 51.
5) submit a job with large number of tasks, each task using 1G memory. 
6) webui shows that the used resource is 52 G, which is 92.9% of the cluster
capacity (instead of the expected 90%), and 103.2% of the queue capacity
(instead of the expected 100%).




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-4191) capacity scheduler: job unexpectedly exceeds queue capacity limit by one task

Posted by "Thomas Graves (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Graves reassigned MAPREDUCE-4191:
----------------------------------------

    Assignee: Thomas Graves
    
> capacity scheduler: job unexpectedly exceeds queue capacity limit by one task
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4191
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4191
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, scheduler
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> While testing the queue capacity limits, it appears that the job can exceed the
> queue capacity limit by one task while the user limit factor is 1. It's not
> clear to me why this is. 
> Here is the steps to reproduce:
> 1) set yarn.app.mapreduce.am.resource.mb to 2048 (default value)
> 2) set yarn.scheduler.capacity.root.default.user-limit-factor to 1.0 (default)
> 3) set yarn.scheduler.capacity.root.default.capacity to 90 (%)
> 4) For a cluster with capacity of 56G, 90% rounded up is 51.
> 5) submit a job with large number of tasks, each task using 1G memory. 
> 6) webui shows that the used resource is 52 G, which is 92.9% of the cluster
> capacity (instead of the expected 90%), and 103.2% of the queue capacity
> (instead of the expected 100%).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4191) capacity scheduler: job unexpectedly exceeds queue capacity limit by one task

Posted by "Thomas Graves (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262799#comment-13262799 ] 

Thomas Graves commented on MAPREDUCE-4191:
------------------------------------------

I'm still following this through to fully understand, but there is a comment in the code in LeafQueue that tries to explain this:

   // Note: We aren't considering the current request since there is a fixed
   // overhead of the AM, but it's a > check, not a >= check, so... 

Which I don't totally follow, I guess if you have one job in the queue that is taking the entire capacity, it allows the job to be more like it was in mrv1 and tries not to penalize you for the AM overhead. The AM however is doing the setup and clean tasks where as in mrv1 it would need to allocate a slot for those.  The AM may have fixed overhead but that overhead is configurable. I could create an AM with 24G of memory or use the default of 1.5G. Or on the flip side, I have an AM that uses 1.5G, but have a map task that now gets scheduled and uses 24G which puts it way over its capacity.  That could affect the queue current usage greatly and seems to break the capacity guarantee. 

In the case where you say have 2 jobs in the queue, you have 2 app masters, one of which is "counted' against your queue and then the other one is not.

I do see it beneficial to queues with very small capacities though, as without this they could be stuck without enough resources to run a task.

Arun or anyone else familiar with capacity scheduler, if you could provide explanation that would be great.
                
> capacity scheduler: job unexpectedly exceeds queue capacity limit by one task
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4191
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4191
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, scheduler
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> While testing the queue capacity limits, it appears that the job can exceed the
> queue capacity limit by one task while the user limit factor is 1. It's not
> clear to me why this is. 
> Here is the steps to reproduce:
> 1) set yarn.app.mapreduce.am.resource.mb to 2048 (default value)
> 2) set yarn.scheduler.capacity.root.default.user-limit-factor to 1.0 (default)
> 3) set yarn.scheduler.capacity.root.default.capacity to 90 (%)
> 4) For a cluster with capacity of 56G, 90% rounded up is 51.
> 5) submit a job with large number of tasks, each task using 1G memory. 
> 6) webui shows that the used resource is 52 G, which is 92.9% of the cluster
> capacity (instead of the expected 90%), and 103.2% of the queue capacity
> (instead of the expected 100%).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira