You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2012/06/08 07:15:23 UTC

[jira] [Created] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Arun C Murthy created MAPREDUCE-4327:
----------------------------------------

             Summary: Enhance CS to schedule accounting for both memory and cpu cores
                 Key: MAPREDUCE-4327
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: mrv2, resourcemanager, scheduler
    Affects Versions: 2.0.0-alpha
            Reporter: Arun C Murthy
            Assignee: Arun C Murthy


With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408570#comment-13408570 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

Sounds good Arun! Since I've now taken a similar trot through the codebase, I'm sure we'll be able to spot all the edge cases together.

Andrew
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4327:
-------------------------------------

    Attachment: MAPREDUCE-4327.patch

Initial (incomplete) sketch of DRF for YARN CS... hopefully this provides enough context for folks to get an good feel.

Essentially I've added the ability for applications to ask for cores alongwith memory and a configurable resource-comparator for the CS to implement DRF-like multi-resource scheduling.

Thoughts?
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415424#comment-13415424 ] 

Arun C Murthy commented on MAPREDUCE-4327:
------------------------------------------

Sorry, my github branch in 'drs'.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4327:
-------------------------------------

    Attachment: MAPREDUCE-4327.patch

Thanks for reviews Bobby. I've incorporated all except the CPU one - not sure if fraction is the right one to go for right now...


Andrew - if you have time, could you pls take a look too? Thanks.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292933#comment-13292933 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

Hi Arun,

I'm excited to see this started -- I'm quite interested in the multi-resource scheduling problem. After reading through the patch, I have a few questions for you; hopefully this feedback will be helpful.

First off, I want to confirm my understanding is correct: this patch is designed to allocate resources to jobs within the same capacity queue based on the DRF-inspired ordering of their need for resources. It is not designed to do weighted DRF for the complete cluster. If I'm mistaken, perhaps some of my feedback my not apply.

1) Are you planning to change the definition of a queue's capacity? Currently, it is defined as a fractional percentage of the parent queue's total memory. Alternatively, queues could be specified with a fractional percentage of each resource. eg, I could have one queue with "75% CPU and 50% RAM" and a second with "25% CPU and 50% RAM".

2) Do you plan to change how spare capacity is allocated? My understanding is that it's currently shared proportionally, based on the queue capacities, an approach seems like it would be intuitive for cluster operators. With a multi-resource setup however, running DRF on the pool of spare resources would provide higher utilization. (I can provide an example of this if you'd like.)

3) Are you planning to support priorities or weights within the queues? IIRC, this was supported in the MR1 scheduler, and the DRF paper describes a weighted extension.

4) Lastly, with the increasing flexibility of the YARN scheduler, I think it makes sense to better support heterogenous clusters. Currently, yarn.nodemanager.resource.memory-mb is a constant across the cluster, but with a scheduler capable of packing differently shaped resource containers onto each node, heterogenous nodes would be a natural extension. (This is more of an observation than a question. :-)


Looking forward to further discussions.

cheers,
Andrew


                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Ferguson updated MAPREDUCE-4327:
---------------------------------------

    Attachment: MAPREDUCE-4327-v5.patch

This extended and updated version now includes tests, and support for CPU cores information throughout the resource manager. 

It also incorporates the feedback from Robert above.

Although this page is very large, there are bulk of the code is either 1) new and updated tests, or 2) updates to the RM and NM webapps, queue metrics, etc. which all need to be updated to display CPU cores as well.

While obviously it would be easier to read this patch if it were split into pieces, the new tests for CPU as a scheduable resource require the updated queue metrics and accounting, creating an inter-dependency. I am certainly open to suggestions from anyone who sees how to split this patch into chunks! :-)

I have tested this patch locally, and it appears to pass the YARN and MapReduce test suites.


your comments and patience appreciated.

thanks,
Andrew
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Junping Du (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427672#comment-13427672 ] 

Junping Du commented on MAPREDUCE-4327:
---------------------------------------

It looks like these findbugs warnings are the same that I try to fix at MAPREDUCE-4452. 
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418489#comment-13418489 ] 

Arun C Murthy commented on MAPREDUCE-4327:
------------------------------------------

Thanks for the reviews Andrew! I've incorporated them and pushed.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Ferguson updated MAPREDUCE-4327:
---------------------------------------

    Attachment: MAPREDUCE-4327-v2.patch

I've amended Arun's original patch to also pass the number of cores via the ContainerStartMonitoringEvent. With this version, the patch in MAPREDUCE-4334 can be used to enforce CPU weights.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293009#comment-13293009 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

err, regarding yarn.nodemanager.resource.memory-mb -- setting it differently on individual nodes creates heterogenous clusters already, whoops.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4327:
-------------------------------------

    Attachment: MAPREDUCE-4327.patch

Here is an updated patch which is complete.

It's already too large, and since it doesn't modify existing behaviour, I think it can go in to unblock other patches while I add more unit tests via an aux-jira. 
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421723#comment-13421723 ] 

Hadoop QA commented on MAPREDUCE-4327:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12537740/MAPREDUCE-4327.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 5 new or modified test files.

    -1 javac.  The patch appears to cause the build to fail.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2655//console

This message is automatically generated.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427020#comment-13427020 ] 

Arun C Murthy commented on MAPREDUCE-4327:
------------------------------------------

The findbugs warnings are all from FairScheduler, sigh, I thought we had fixed them - anyway, unrelated to this patch.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427018#comment-13427018 ] 

Hadoop QA commented on MAPREDUCE-4327:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538848/MAPREDUCE-4327.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 12 new or modified test files.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    -1 findbugs.  The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2696//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2696//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2696//console

This message is automatically generated.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407294#comment-13407294 ] 

Robert Joseph Evans commented on MAPREDUCE-4327:
------------------------------------------------

Andrew, Sorry this has taken me so long to get to this.  Thanks to both you and Arun for taking this up.  It is something that is going to be really great when it is done.

I have a few comments on the code.

# ResourceComparator.java needs to have the Apache License Header in it.
# I don't really like having the resource comparator class configuration be specific to the scheduler. I would prefer to see it be available for both the fifo and the capacity scheduler.  This is very minor, but it enforces consistency between the schedulers.
# In a few places like SchedulerNode, we still operate on each of the resources separately.  {code}     this.availableResource.setMemory(node.getTotalCapability().getMemory());
this.availableResource.setCores(node.getTotalCapability().getCores());
{code} It would really be nice to be able to abstract some of that away, like with the comparisons, so that if we add in new resources in the future, we do not need to change the code again. (This is also very minor)
# To answer your question about computeSlotMillis, I would say no, but I am open to others opinions on this.  This counter is there to try an maintain backwards compatibility.  It was intended to indicate how much total resources the job used, i.e. how many milliseconds this job held a slot that no-one else could use.  Because there are no real slots any more, I would prefer to see this metric deprecated, and replaced with something that breaks it down by the resources involved.  But that is probably for a separate JIRA, because it is a potentially complex question.
# I would like better protections against someone passing in a 0 or null for the number of CPU cores in YARN.  For MR I see a new default for the number of CORES being requested, but I don't see an equivalent in just YARN.  This is mostly because CPU cores is being added in and I can see other applications, like the distributed shell, not being updated, which could result in all kinds of issues.  It would be great that if no CPU request is given we default to 1.
# Could we change the name of ResourceMemoryCpuComparator to something more like DefaultMultiResourceComparator?  I think ResourceMemoryCpuNetworkBandwithDiskStorageGPUComparator is a bit long, but it is the direction we are headed in. 
# Do we want to be able to schedule only part of a core (make the resource a float not an int)?  For a Map or Reduce task we typically are only going to want 1 CPU, but some things like the MR AM, unless it is a very big job, 0.5 cores is overkill for what it does.

This is just a cursory look but I like what I see.

To chime in on some of your questions
bq. Are you planning to change the definition of a queue's capacity?
I think this could be something very useful, but should probably be done on a separate JIRA.
bq. Do you plan to change how spare capacity is allocated?
What do you mean by spare capacity?  Do you mean capacity that is not currently in use?  If so I would love to see a patch that does this, so that I can run gridmix on it both ways and see what the results are.
bq. Are you planning to support priorities or weights within the queues?
I would also like to see something like this happen, but from discussions I have had in the past, at least for the MRV1 case we would need something like preemption to be able to avoid some potential deadlocks.  I could be wrong about that here, because the resource allocation now behaves differently in with respect to a priority, but either way I think that discussion is something that should be done on a separate JIRA to avoid blocking this coming in.

Thanks again to both you and Arun for doing this.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408498#comment-13408498 ] 

Arun C Murthy commented on MAPREDUCE-4327:
------------------------------------------

I've opened branches/MR-4327 for supporting CPU scheduling in YARN.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407403#comment-13407403 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

hi Robert,

Thanks for you feedback! since I posted the earlier update, I've been pushing it to completion: adding CPU core information to the queue metrics, resource manager web interface, etc. I've also been adding test cases and ensuring that the new patch passes existing test cases as well. currently, the patch is failing just a few unit tests, but I expect it will be done in a day or two.

as the patch has grown quite large (the diff is pushing 7000 lines..), it's clear we want to minimize the cost of adding a third resource. as it is, most of the diff is new testing. I will strive to keep function calls as general as possible (eg, "Resource r" instead of "int memory, float cores"), but there are quite a few places where we want to consider each resource separately since the math can be different, and it should be clear to anyone adding additional resources that they need to consider something in that function's logic.

Regarding applications which haven't been updated for CPU cores, and might submit a request with 0 or NULL, my current patch does round the request to the minimum resource request, so those applications will be fine. (not sure if the currently attached patch does this)


Regarding "spare capacity" -- I think this is one of the differences between the capacity scheduler and the fair scheduler. should the capacity not in use (or leftover capacity from queues which can't fill it because of the new multi-dimensional nature of resources) be simply split over the queues based on their capacity percentages? or should that capacity be treated as a single pool, and allocations be made treating the capacity percentages as weights? (this is more of a Fair Sched approach). anyway, I agree,, that should probably be left as a separate JIRA, or perhaps simply left to the Fair Scheduler.


I'll incorporate your other points (eg, comparator name, ASF license) in my updated patch.


thanks!
Andrew



                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422480#comment-13422480 ] 

Robert Joseph Evans commented on MAPREDUCE-4327:
------------------------------------------------

I have looked through the patch somewhat quickly. 

My comments are mostly the same ones that I had for the previous patches.

I don't really like having the resource comparator class configuration be specific to the scheduler. I would prefer to see it be available for both the fifo and the capacity scheduler. This is very minor, but it enforces consistency between the schedulers.

The more I think about it the more I want to see the ability to request only part of a core.  I don't think we need to make it a true float.  Perhaps we should round up to the closest quarter of a core, but requiring full core increments is too course of a measure.  I think we are going to get bad cluster utilization unless we can do a more fined grained approach.  

Also inside LeafQueue.java and ParentQueue.java there is some code that was refactored to use the new Resources Methods, but the original code is still there, just commented out.  Please clean this up.

                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Thomas Graves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423164#comment-13423164 ] 

Thomas Graves commented on MAPREDUCE-4327:
------------------------------------------

Perhaps I'm missing something easy, but patch doesn't seem to compile against trunk.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427447#comment-13427447 ] 

Robert Joseph Evans commented on MAPREDUCE-4327:
------------------------------------------------

Thanks Arun.  The changes look good.  I still want something that would allow a task that uses almost no CPU to indicate that.  I don't think float is the correct solution, but we need something.  I am fine if you want to punt on that, but I would like to ses us  mark the API as unstable until we can come up with some sort of a solution.

I am also concerned about the addition of @Ignore to some of the tests.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405651#comment-13405651 ] 

Todd Lipcon commented on MAPREDUCE-4327:
----------------------------------------

Hey Arun. Will you have time to review Andrew's update to your patch soon, or should I try to take a look?
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4327:
-------------------------------------

    Status: Patch Available  (was: Open)
    
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408535#comment-13408535 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

hi Arun,

a branch sounds like a great idea. thanks!

I did update the Capacity Scheduler to schedule and account for CPU in the most recent patch. The key updates are in CSQueueUtils, LeafQueue, and ParentQueue, and they quite heavily tested by the fully updated Capacity Scheduler test suite.

To update the Capacity Scheduler, I followed the logic from DRF, taking your dominant resource's share as the capacity you are consuming.


best,
Andrew


ps -- Do you mind if I re-set the Patch Available flag? While this patch passes the tests I ran locally (`mvn tes`t in hadoop-mapreduce-project/hadoop-yarn/), I am curious what the Apache buildbot thinks of it. thanks!
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4327:
-------------------------------------

    Status: Open  (was: Patch Available)

Broken merge.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415652#comment-13415652 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

awesome, thanks for the update Arun. I just finished reading through your commits. so far, your patch looks a lot like mine, which is great! hopefully that means our logic is correct. :-)   I like that you pulled more of the division and rounding code into the ResourceComparator, and out of CSQueueUtils to keep it modular; I didn't think to do that.

I have a few suggestions for you (all of which I learned after writing test cases):

1) In ResourceMemoryCpuComparator (renamed "DefaultMultiResourceComparator" in my patch), I found that a simple "if lhs.equals(rhs) return 0;" was needed at the start -- after dividing by the cluster resources, two identical resource requests might appear to be different due to floating point issues.

2) In the same class, I found that I needed to normalize the resources (by the cluster's resources), and then sort them to compare two resources which consume the same amount of their most-dominant resource, but differing amounts of their 2nd-most-dominant resource. This is important when checking that you don't exceed a resource limit (eg, "greaterThan(comparator, consumed, limit)") -- it may be that I'm within the limit for CPUs (which is the dominant resource), but exceeding the limit for memory (which is not my dominant resource).

3) In resourcemanager.resource.Resources, when multiplying CPUs by a float, because CPUs is an int, I needed two versions: one which rounded-up, and one which rounded-down. Calculating queueMaxCap was the only time I needed the round-down version. Technically, this is also needed for memory (since it is also an int), but as long as we only allocate memory in units of at least, say, 128 MB (as is current practice in the code), the extra bits in the int (0 bytes - 128 MB) are actually serving as a store for the fractional part! and thus, the existing roundUp() and roundDown() functions (from CSQueueUtils) suffice.


cheers,
Andrew
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291553#comment-13291553 ] 

Arun C Murthy commented on MAPREDUCE-4327:
------------------------------------------

An option to consider for multi-resource scheduling is the approach outlined by Ghodsi et al in the DRF paper: http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-18.pdf

                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Ferguson updated MAPREDUCE-4327:
---------------------------------------

    Attachment: MAPREDUCE-4327-v4.patch

One more tweak: FIFO scheduler now respects CPU core availability, and will not allocate containers if no CPUs are available. (sorry for the spam)
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411561#comment-13411561 ] 

Arun C Murthy commented on MAPREDUCE-4327:
------------------------------------------

Please follow my github (https://github.com/acmurthy/hadoop-common/tree/MR-4327) for updates as I do it. Once it's close I'll upload the final patch here.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427508#comment-13427508 ] 

Arun C Murthy commented on MAPREDUCE-4327:
------------------------------------------

Oops, my bad. I forgot to point that out.

I will need some help from someone on FairScheduler - I don't know enough about it and not sure why those tests failed due to my (almost) non-existent work there during this patch. Thanks.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4327:
-------------------------------------

    Status: Patch Available  (was: Open)
    
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Ferguson updated MAPREDUCE-4327:
---------------------------------------

        Assignee: Andrew Ferguson  (was: Arun C Murthy)
    Release Note: Add support to YARN for CPU scheduling
          Status: Patch Available  (was: Open)
    
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Andrew Ferguson
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4327:
-------------------------------------

    Assignee: Arun C Murthy  (was: Andrew Ferguson)
      Status: Open  (was: Patch Available)

Andrew - I think we should break this down into multiple jiras and probably even work on a branch.

I'll open new jiras and assign some over while I finish up the CS, ok? Thanks.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Ferguson updated MAPREDUCE-4327:
---------------------------------------

    Attachment: MAPREDUCE-4327-v3.patch

This is an updated version of my previous patch. Of particular note, it fixes a typo in the original patch in Resources' subtractFrom() -- the original patch double-subtracted memory.

I have successfully run MapReduce jobs using this patch, and they will request both memory and cpu cores. So far, I have only tested this with the FIFO scheduler.

When combined with MAPREDUCE-4351 and MAPREDUCE-4334, the requested cpu share is also enforced.

One discussion point I want to raise is: currently the number of requested cores is an integer. Do we want to support fractional cores as well?
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Andrew Ferguson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427015#comment-13427015 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

@acmurthy: you bet! I should have time this week to read this over.

also, I'm happy to port pieces of my ginormous patch over to this if you'd like -- while the majority of the patch I posted is test cases (which may or may not match the semantics of your DRF implementation due to decisions about edge cases), other pieces such as the FIFO support, the web GUI, and the metrics code might save you some time.

cheers,
Andrew
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408556#comment-13408556 ] 

Arun C Murthy commented on MAPREDUCE-4327:
------------------------------------------

Andrew - I have an updated version of my CS patch which differs significantly. I'll post it over the weekend and you can review and provide f/b. Ok? Thanks.
                
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira