You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by "craigcondit (via GitHub)" <gi...@apache.org> on 2023/02/24 17:45:03 UTC

[GitHub] [yunikorn-k8shim] craigcondit opened a new pull request, #542: [YUNIKORN-1601] [WIP] Shim: Implement preemption predicate handling

craigcondit opened a new pull request, #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542

   ### What is this PR for?
   Add PreemptionPredicates() RM callback implementation to support preemption in core.
   
   ### What type of PR is it?
   * [ ] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [x] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-1601
   
   ### How should this be tested?
   Manually verified that algorithm functions when combined with https://github.com/apache/yunikorn-core/pull/511.
   Unit tests (TODO) and eventually e2e tests once core is updated.
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] pbacsko commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1123216664


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {
+	// assume minimal pods need killing if running in testing mode
+	if ctx.apiProvider.IsTestingMode() {
+		return startIndex, ok
+	}
+
+	ctx.lock.RLock()
+	defer ctx.lock.RUnlock()
+	if pod, ok := ctx.schedulerCache.GetPod(name); ok {
+		// if pod exists in cache, try to run predicates
+		if targetNode := ctx.schedulerCache.GetNode(node); targetNode != nil {
+			// need to lock cache here as predicates need a stable view into the cache
+			ctx.schedulerCache.LockForReads()
+			defer ctx.schedulerCache.UnlockForReads()
+
+			// look up each victim in the scheduler cache
+			victims := make([]*v1.Pod, 0)
+			for _, uid := range allocations {
+				if victim, ok := ctx.schedulerCache.GetPod(uid); ok {

Review Comment:
   `GetPod()` also acquires a read lock, this call blocks here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] craigcondit commented on pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#issuecomment-1453666214

   > Need unit test for `IsPodFitNodeViaPreemption()`.
   
   Again, same as above, this would require mocking out a substantial amount of the underlying code, which doesn't really verify that it works properly. As with `IsPodFitNode()` (which this mirrors) we will test this primarily with e2e tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] craigcondit closed pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit closed pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling
URL: https://github.com/apache/yunikorn-k8shim/pull/542


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] pbacsko commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1124578704


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {
+	// assume minimal pods need killing if running in testing mode
+	if ctx.apiProvider.IsTestingMode() {
+		return startIndex, ok
+	}
+
+	ctx.lock.RLock()
+	defer ctx.lock.RUnlock()
+	if pod, ok := ctx.schedulerCache.GetPod(name); ok {
+		// if pod exists in cache, try to run predicates
+		if targetNode := ctx.schedulerCache.GetNode(node); targetNode != nil {
+			// need to lock cache here as predicates need a stable view into the cache
+			ctx.schedulerCache.LockForReads()
+			defer ctx.schedulerCache.UnlockForReads()
+
+			// look up each victim in the scheduler cache
+			victims := make([]*v1.Pod, 0)
+			for _, uid := range allocations {
+				if victim, ok := ctx.schedulerCache.GetPod(uid); ok {

Review Comment:
   Thanks, looks like I missed that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] codecov[bot] commented on pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#issuecomment-1447259375

   # [Codecov](https://codecov.io/gh/apache/yunikorn-k8shim/pull/542?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#542](https://codecov.io/gh/apache/yunikorn-k8shim/pull/542?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d7c6603) into [master](https://codecov.io/gh/apache/yunikorn-k8shim/commit/b1bc20b4b08a237aa16ded996bb613ab35e32e51?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b1bc20b) will **decrease** coverage by `0.32%`.
   > The diff coverage is `34.28%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #542      +/-   ##
   ==========================================
   - Coverage   69.52%   69.20%   -0.32%     
   ==========================================
     Files          45       45              
     Lines        7727     7797      +70     
   ==========================================
   + Hits         5372     5396      +24     
   - Misses       2158     2201      +43     
   - Partials      197      200       +3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/yunikorn-k8shim/pull/542?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [pkg/cache/context.go](https://codecov.io/gh/apache/yunikorn-k8shim/pull/542?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2NvbnRleHQuZ28=) | `41.06% <0.00%> (-1.36%)` | :arrow_down: |
   | [pkg/plugin/predicates/predicate\_manager.go](https://codecov.io/gh/apache/yunikorn-k8shim/pull/542?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3BsdWdpbi9wcmVkaWNhdGVzL3ByZWRpY2F0ZV9tYW5hZ2VyLmdv) | `51.42% <55.81%> (+0.69%)` | :arrow_up: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] craigcondit commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1124577331


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {
+	// assume minimal pods need killing if running in testing mode
+	if ctx.apiProvider.IsTestingMode() {
+		return startIndex, ok
+	}
+
+	ctx.lock.RLock()
+	defer ctx.lock.RUnlock()
+	if pod, ok := ctx.schedulerCache.GetPod(name); ok {
+		// if pod exists in cache, try to run predicates
+		if targetNode := ctx.schedulerCache.GetNode(node); targetNode != nil {
+			// need to lock cache here as predicates need a stable view into the cache
+			ctx.schedulerCache.LockForReads()
+			defer ctx.schedulerCache.UnlockForReads()
+
+			// look up each victim in the scheduler cache
+			victims := make([]*v1.Pod, 0)
+			for _, uid := range allocations {
+				if victim, ok := ctx.schedulerCache.GetPod(uid); ok {

Review Comment:
   No, it doesn't. It takes the cache lock (which is a different lock). This mirrors the existing code in `IsPodFitNode()` which has been working reliably forever.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] pbacsko commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1123225049


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {
+	// assume minimal pods need killing if running in testing mode
+	if ctx.apiProvider.IsTestingMode() {
+		return startIndex, ok
+	}
+
+	ctx.lock.RLock()
+	defer ctx.lock.RUnlock()
+	if pod, ok := ctx.schedulerCache.GetPod(name); ok {
+		// if pod exists in cache, try to run predicates
+		if targetNode := ctx.schedulerCache.GetNode(node); targetNode != nil {

Review Comment:
   Nit: consider reducing the nesting, eg.
   ```
   if targetnode == nil {
     return -1, false
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] pbacsko commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1123216664


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {
+	// assume minimal pods need killing if running in testing mode
+	if ctx.apiProvider.IsTestingMode() {
+		return startIndex, ok
+	}
+
+	ctx.lock.RLock()
+	defer ctx.lock.RUnlock()
+	if pod, ok := ctx.schedulerCache.GetPod(name); ok {
+		// if pod exists in cache, try to run predicates
+		if targetNode := ctx.schedulerCache.GetNode(node); targetNode != nil {
+			// need to lock cache here as predicates need a stable view into the cache
+			ctx.schedulerCache.LockForReads()
+			defer ctx.schedulerCache.UnlockForReads()
+
+			// look up each victim in the scheduler cache
+			victims := make([]*v1.Pod, 0)
+			for _, uid := range allocations {
+				if victim, ok := ctx.schedulerCache.GetPod(uid); ok {

Review Comment:
   `GetPod()` also acquires a read lock, this method blocks here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] craigcondit commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1124580895


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {
+	// assume minimal pods need killing if running in testing mode
+	if ctx.apiProvider.IsTestingMode() {
+		return startIndex, ok
+	}
+
+	ctx.lock.RLock()
+	defer ctx.lock.RUnlock()
+	if pod, ok := ctx.schedulerCache.GetPod(name); ok {

Review Comment:
   Same as above, this is idiomatic go. Adding early return and extra variable in outer scope to avoid one level of nesting is not a good tradeoff.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] pbacsko commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1123226882


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {
+	// assume minimal pods need killing if running in testing mode
+	if ctx.apiProvider.IsTestingMode() {
+		return startIndex, ok
+	}
+
+	ctx.lock.RLock()
+	defer ctx.lock.RUnlock()
+	if pod, ok := ctx.schedulerCache.GetPod(name); ok {

Review Comment:
   Nit: consider reducing the nesting, eg.
   ```
   if !ok {
     return -1, false
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] pbacsko commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1123218878


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {

Review Comment:
   Please add unit test coverage in context_test.go.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] craigcondit commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1124578979


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {

Review Comment:
   This is going to be virtually impossible for the same reasons we also don't have coverage for `IsPodFitNode()` as doing so would require mocking essentially the entire stack underneath. This is why we will do this via e2e tests once the core is complete.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] craigcondit commented on a diff in pull request #542: [YUNIKORN-1601] Shim: Implement preemption predicate handling

Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on code in PR #542:
URL: https://github.com/apache/yunikorn-k8shim/pull/542#discussion_r1124580034


##########
pkg/cache/context.go:
##########
@@ -454,6 +454,41 @@ func (ctx *Context) IsPodFitNode(name, node string, allocate bool) error {
 	return fmt.Errorf("predicates were not running because pod or node was not found in cache")
 }
 
+func (ctx *Context) IsPodFitNodeViaPreemption(name, node string, allocations []string, startIndex int) (index int, ok bool) {
+	// assume minimal pods need killing if running in testing mode
+	if ctx.apiProvider.IsTestingMode() {
+		return startIndex, ok
+	}
+
+	ctx.lock.RLock()
+	defer ctx.lock.RUnlock()
+	if pod, ok := ctx.schedulerCache.GetPod(name); ok {
+		// if pod exists in cache, try to run predicates
+		if targetNode := ctx.schedulerCache.GetNode(node); targetNode != nil {

Review Comment:
   I disagree, the code as written is idiomatic go. Adding an early return and extra variable to avoid one level of nesting is not a good tradeoff IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org