You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by "manirajv06 (via GitHub)" <gi...@apache.org> on 2023/06/23 10:55:46 UTC

[GitHub] [yunikorn-core] manirajv06 opened a new pull request, #580: [YUNIKORN-1834] Calculate user/group headroom

manirajv06 opened a new pull request, #580:
URL: https://github.com/apache/yunikorn-core/pull/580

   ### What is this PR for?
   Calculate user/group headroom and allow application to proceed only if there is a space to run.
   
   
   ### What type of PR is it?
   * [ ] - Feature
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-1834
   
   ### How should this be tested?
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] codecov[bot] commented on pull request #580: [YUNIKORN-1834] Calculate user/group headroom

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #580:
URL: https://github.com/apache/yunikorn-core/pull/580#issuecomment-1611194231

   ## [Codecov](https://app.codecov.io/gh/apache/yunikorn-core/pull/580?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#580](https://app.codecov.io/gh/apache/yunikorn-core/pull/580?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (f02f67c) into [master](https://app.codecov.io/gh/apache/yunikorn-core/commit/530533f78bdcc733d3075b6ff41bea2e8d999b80?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (530533f) will **decrease** coverage by `0.20%`.
   > The diff coverage is `40.29%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #580      +/-   ##
   ==========================================
   - Coverage   76.41%   76.21%   -0.20%     
   ==========================================
     Files          74       74              
     Lines       12209    12276      +67     
   ==========================================
   + Hits         9329     9356      +27     
   - Misses       2569     2608      +39     
   - Partials      311      312       +1     
   ```
   
   
   | [Impacted Files](https://app.codecov.io/gh/apache/yunikorn-core/pull/580?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [pkg/scheduler/ugm/group\_tracker.go](https://app.codecov.io/gh/apache/yunikorn-core/pull/580?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGtnL3NjaGVkdWxlci91Z20vZ3JvdXBfdHJhY2tlci5nbw==) | `93.33% <0.00%> (-6.67%)` | :arrow_down: |
   | [pkg/scheduler/ugm/manager.go](https://app.codecov.io/gh/apache/yunikorn-core/pull/580?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGtnL3NjaGVkdWxlci91Z20vbWFuYWdlci5nbw==) | `74.29% <0.00%> (-3.39%)` | :arrow_down: |
   | [pkg/scheduler/ugm/user\_tracker.go](https://app.codecov.io/gh/apache/yunikorn-core/pull/580?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGtnL3NjaGVkdWxlci91Z20vdXNlcl90cmFja2VyLmdv) | `94.20% <0.00%> (-5.80%)` | :arrow_down: |
   | [pkg/scheduler/objects/application.go](https://app.codecov.io/gh/apache/yunikorn-core/pull/580?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL2FwcGxpY2F0aW9uLmdv) | `65.28% <14.28%> (-0.26%)` | :arrow_down: |
   | [pkg/scheduler/ugm/queue\_tracker.go](https://app.codecov.io/gh/apache/yunikorn-core/pull/580?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGtnL3NjaGVkdWxlci91Z20vcXVldWVfdHJhY2tlci5nbw==) | `93.38% <81.25%> (-1.62%)` | :arrow_down: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] manirajv06 closed pull request #580: [YUNIKORN-1834] Calculate user/group headroom

Posted by "manirajv06 (via GitHub)" <gi...@apache.org>.
manirajv06 closed pull request #580: [YUNIKORN-1834] Calculate user/group headroom
URL: https://github.com/apache/yunikorn-core/pull/580


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] manirajv06 commented on pull request #580: [YUNIKORN-1834] Calculate user/group headroom

Posted by "manirajv06 (via GitHub)" <gi...@apache.org>.
manirajv06 commented on PR #580:
URL: https://github.com/apache/yunikorn-core/pull/580#issuecomment-1618239817

   @wilfred-s can you take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] wilfred-s commented on a diff in pull request #580: [YUNIKORN-1834] Calculate user/group headroom

Posted by "wilfred-s (via GitHub)" <gi...@apache.org>.
wilfred-s commented on code in PR #580:
URL: https://github.com/apache/yunikorn-core/pull/580#discussion_r1246256135


##########
pkg/scheduler/ugm/manager.go:
##########
@@ -572,6 +578,41 @@ func (m *Manager) getGroupWildCardLimitsConfig(queuePath string) *LimitConfig {
 	return nil
 }
 
+func (m *Manager) Headroom(queuePath string, user security.UserGroup) *resources.Resource {
+	m.RLock()
+	defer m.RUnlock()
+	var userHeadroom *resources.Resource
+	var groupHeadroom *resources.Resource
+	if m.userTrackers[user.User] != nil {
+		userHeadroom = m.userTrackers[user.User].headroom(queuePath)
+		log.Log(log.SchedUGM).Debug("Calculated headroom for user",
+			zap.String("user", user.User),
+			zap.String("queue path", queuePath),
+			zap.String("user headroom", userHeadroom.String()))
+	}
+	group, err := m.getGroup(user)
+	if err == nil {
+		if m.groupTrackers[group] != nil {
+			groupHeadroom = m.groupTrackers[group].headroom(queuePath)
+			log.Log(log.SchedUGM).Debug("Calculated headroom for group",
+				zap.String("group", group),
+				zap.String("queue path", queuePath),
+				zap.String("group headroom", groupHeadroom.String()))
+		}
+	}
+
+	switch {
+	case userHeadroom != nil && groupHeadroom != nil:
+		return resources.ComponentWiseMinPermissive(userHeadroom, groupHeadroom)
+	case userHeadroom != nil && groupHeadroom == nil:
+		return userHeadroom
+	case userHeadroom == nil && groupHeadroom != nil:
+		return groupHeadroom
+	default:
+		return nil
+	}

Review Comment:
   All this is handled in `ComponentWiseMinPermissive()` no need for nil checks.



##########
pkg/scheduler/objects/application.go:
##########
@@ -1462,6 +1462,14 @@ func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation {
 	if !node.preAllocateConditions(ask) {
 		return nil
 	}
+	userHeadroom := ugm.GetUserManager().Headroom(sa.queuePath, sa.user)
+	if userHeadroom != nil && !userHeadroom.FitInMaxUndef(ask.GetAllocatedResource()) {
+		log.Log(log.SchedApplication).Warn("User doesn't have required resources to accommodate this request",
+			zap.String("required resource", ask.GetAllocatedResource().String()),
+			zap.String("headroom", userHeadroom.String()))
+		return nil
+	}

Review Comment:
   This is too late in the cycle. We have done a lot of work before we get here and might keep trying nodes when we know the allocation will not fit. We need to pull this up in the cycle to places like: `application.tryAllocate()` around the point that we check the queue headroom.



##########
pkg/scheduler/ugm/queue_tracker.go:
##########
@@ -243,6 +243,42 @@ func (qt *QueueTracker) setLimit(queuePath string, maxResource *resources.Resour
 	childQueueTracker.maxResources = maxResource
 }
 
+func (qt *QueueTracker) headroom(queuePath string) *resources.Resource {
+	log.Log(log.SchedUGM).Debug("Calculating headroom",
+		zap.String("queue path", queuePath))
+	childQueuePath, immediateChildQueueName := getChildQueuePath(queuePath)
+	if childQueuePath != "" {
+		if qt.childQueueTrackers[immediateChildQueueName] != nil {
+			headroom := qt.childQueueTrackers[immediateChildQueueName].headroom(childQueuePath)
+			if headroom != nil {
+				log.Log(log.SchedUGM).Debug("Min of current queue and parent queue headroom",
+					zap.String("queue path", queuePath),
+					zap.String("current queue", qt.queueName),
+					zap.String("current queue max resource", qt.maxResources.String()),
+					zap.String("child's headroom", headroom.String()),
+					zap.String("Min of current queue max resources and child's headroom", resources.ComponentWiseMinPermissive(headroom, qt.maxResources).String()))

Review Comment:
   This causes a duplicate call to `ComponentWiseMinPermissive()`
   
   Need to also look at the amount of logging we generate in these recursive calls we should only log at the final stage. The queue does not log this much detail either when calculating the headroom and this could become really expensive.
   
   BTW: All resources **must** use the `zap.Stringer` for logging



##########
pkg/scheduler/objects/application.go:
##########
@@ -1462,6 +1462,14 @@ func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation {
 	if !node.preAllocateConditions(ask) {
 		return nil
 	}
+	userHeadroom := ugm.GetUserManager().Headroom(sa.queuePath, sa.user)
+	if userHeadroom != nil && !userHeadroom.FitInMaxUndef(ask.GetAllocatedResource()) {

Review Comment:
   No need to nil protect, `FitInMaxUndef()` is nil safe



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org