You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/04/05 00:57:18 UTC
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

    [ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960541#comment-13960541 ] 

Jason Lowe commented on YARN-1769:
----------------------------------

The patch no longer applies cleanly after YARN-1512.  Other comments on the patch:

- Nit: In LeafQueue.assignToQueue we could cache Resource.add(usedResources, required) in a local when we're computing potentialNewCapacity so we don't have to recompute it as part of the potentialNewWithoutReservedCapacity computation
- LeafQueue.assignToQueue and LeafQueue.assignToUser don't seem to need the new priority argument, and therefore LeafQueue.checkLimitsToReserve wouldn't seem to need it either once those others are updated.
- Should FiCaSchedulerApp getAppToUnreserve really be called getNodeIdToUnreserve or geNodeToUnreserve, since it's returning a node ID rather than an app?
- In LeafQueue.findNodeToUnreserve, isn't it kinda bad if the app thinks it has reservations on the node but the scheduler doesn't know about it?  Wondering if the bookkeeping is messed up at that point therefore something a bit more than debug is an appropriate log level and if further fixup is needed.
- LeafQueue.findNodeToUnreserve is adjusting the headroom when it unreserves, but I don't see other unreservations doing a similar calculation.  Wondering if this fixup is something that should have been in completedContainer or needs to be done elsewhere?  I could easily be missing something here but asking just in case other unreservation situations also need to have the headroom fixed.
- LeafQueue.assignContainer uses the much more expensive scheduler.getConfiguration().getReservationContinueLook() when it should be able to use the reservationsContinueLooking member instead.
- LeafQueue.getReservationContinueLooking should be package private
- Nit: LeafQueue.assignContainer has some reformatting of the log message after the "// Inform the node" comment which was clearer to read/maintain before since the label and the value were always on a line by themselves.  Same goes for the "Reserved container" log towards the end of the method.
- Ultra-Nit: ParentQueue.setupQueueConfig's log message should have the reservationsContinueLooking on the previous line to match the style of other label/value pairs in the log message.
- ParentQueue.getReservationContinueLooking should be package private.

> CapacityScheduler:  Improve reservations
> ----------------------------------------
>
>                 Key: YARN-1769
>                 URL: https://issues.apache.org/jira/browse/YARN-1769
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 2.3.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch
>
>
> Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host.
> The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers).  Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request.   
> The other place for improvement is currently reservations count against your queue capacity.  If you have reservations you could hit the various limits which would then stop you from looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)