You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2022/11/29 17:56:04 UTC

[GitHub] [yunikorn-core] pbacsko opened a new pull request, #458: [YUNIKORN-1434] Mark allocation as being preempted

pbacsko opened a new pull request, #458:
URL: https://github.com/apache/yunikorn-core/pull/458

   ### What is this PR for?
   Avoid multiple preemption for the same Ask, otherwise the output is flooded by preemption attempt messages.
   
   ### What type of PR is it?
   * [x] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-1434
   
   ### How should this be tested?
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] craigcondit commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
craigcondit commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1035280632


##########
pkg/scheduler/objects/application.go:
##########
@@ -1742,3 +1754,10 @@ func (sa *Application) SetTimedOutPlaceholder(taskGroupName string, timedOut int
 		sa.placeholderData[taskGroupName].TimedOut = timedOut
 	}
 }
+
+// test only
+func (sa *Application) SetPreemptionAttemptInterval(interval time.Duration) {

Review Comment:
   Again, I think we need to reconsider this. It's the RM's responsibility to kill tasks when we ask it to, and we need to trust that it will occur. However, expecting that this happens in some predetermined time is asking for trouble. By preempting once (only) and reserving the node, we are reserving space until the requisite tasks are terminated.
   
   The only thing we might want to do is keep track of whether or not all the tasks have been preempted yet for a node. If they have, and we are still unschedulable, that likely indicates that another scheduler got in line ahead of us and we need to try preempting again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] craigcondit closed pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
craigcondit closed pull request #458: [YUNIKORN-1434] Mark allocation as being preempted
URL: https://github.com/apache/yunikorn-core/pull/458


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
pbacsko commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1035749747


##########
pkg/scheduler/objects/application.go:
##########
@@ -1742,3 +1754,10 @@ func (sa *Application) SetTimedOutPlaceholder(taskGroupName string, timedOut int
 		sa.placeholderData[taskGroupName].TimedOut = timedOut
 	}
 }
+
+// test only
+func (sa *Application) SetPreemptionAttemptInterval(interval time.Duration) {

Review Comment:
   The interval is only considered if there was a preemption attempt for an ask, but no victim allocations were found. In this case, we probably want to do it again, but not immediately, because we'd flood the logs. If the preemption request was successful, then this interval doesn't matter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
pbacsko commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1037165259


##########
pkg/scheduler/objects/application.go:
##########
@@ -1128,6 +1135,7 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 }
 
 func (sa *Application) tryPreemption(reserve *reservation, ask *AllocationAsk) bool {
+	ask.SetLastPreemptionAttempt(time.Now())

Review Comment:
   Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] codecov[bot] commented on pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#issuecomment-1331079125

   # [Codecov](https://codecov.io/gh/apache/yunikorn-core/pull/458?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#458](https://codecov.io/gh/apache/yunikorn-core/pull/458?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (66ce6dd) into [master](https://codecov.io/gh/apache/yunikorn-core/commit/d28b6461176927bb42a1bf9cd4bca1fb2eb18e23?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d28b646) will **decrease** coverage by `0.25%`.
   > The diff coverage is `28.84%`.
   
   > :exclamation: Current head 66ce6dd differs from pull request most recent head eea7c5e. Consider uploading reports for the commit eea7c5e to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #458      +/-   ##
   ==========================================
   - Coverage   72.64%   72.39%   -0.26%     
   ==========================================
     Files          67       67              
     Lines        9823     9860      +37     
   ==========================================
   + Hits         7136     7138       +2     
   - Misses       2446     2481      +35     
     Partials      241      241              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/yunikorn-core/pull/458?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [pkg/scheduler/objects/allocation.go](https://codecov.io/gh/apache/yunikorn-core/pull/458/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL2FsbG9jYXRpb24uZ28=) | `85.41% <0.00%> (-3.72%)` | :arrow_down: |
   | [pkg/scheduler/objects/allocation\_ask.go](https://codecov.io/gh/apache/yunikorn-core/pull/458/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL2FsbG9jYXRpb25fYXNrLmdv) | `81.34% <0.00%> (-11.03%)` | :arrow_down: |
   | [pkg/scheduler/objects/application.go](https://codecov.io/gh/apache/yunikorn-core/pull/458/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL2FwcGxpY2F0aW9uLmdv) | `57.01% <7.69%> (-0.44%)` | :arrow_down: |
   | [pkg/webservice/handlers.go](https://codecov.io/gh/apache/yunikorn-core/pull/458/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3dlYnNlcnZpY2UvaGFuZGxlcnMuZ28=) | `78.53% <93.33%> (-0.10%)` | :arrow_down: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] wilfred-s commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1036719716


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 
 		// Do we need a specific node?
 		if ask.GetRequiredNode() != "" {
+			if ask.HasTriggeredPreemption() || time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   I think @craigcondit was right: once we triggered preemption the kills will happen and cannot be reversed. We do not re-trigger *ever* in any case not even after a short period of time.
   The idea is to find a victim list and if we have a list:
   -  mark the ask as preemption triggered 
   - kill all victims
   - mark all victims as preempted
   
   Next scheduling cycle if the ask is still there and has the preemption triggered flag skip it and do not re-trigger. The node is reserved for the ask so it will get scheduled on that node. It could take up to the grace time for the pod to shutdown, which could be loooong in scheduling times.



##########
pkg/scheduler/objects/allocation_ask.go:
##########
@@ -245,3 +247,31 @@ func (aa *AllocationAsk) GetAllocationLog() []*AllocationLogEntry {
 	}
 	return res
 }
+
+// SetTriggeredPreemption sets whether preemption has been triggered for this ask
+func (aa *AllocationAsk) SetTriggeredPreemption(triggered bool) {
+	aa.Lock()
+	defer aa.Unlock()
+	aa.preemptionTriggered = triggered
+}
+
+// HasTriggeredPreemption returns whether this ask has triggered preemption
+func (aa *AllocationAsk) HasTriggeredPreemption() bool {
+	aa.RLock()
+	defer aa.RUnlock()
+	return aa.preemptionTriggered
+}
+
+// SetLastPreemptionAttempt sets the time which tells when the last preemption attempt was performed for this ask.
+func (aa *AllocationAsk) SetLastPreemptionAttempt(attempt time.Time) {

Review Comment:
   I would expect that the method would generate the time by calling `time.Now()` instead of passing in the pre-created object.
   
   If this was done to allow resetting the time during tests we should add a method that allows clearing both time and the flag in one call. There could even be an argument for not having the boolean: if the time is set the preemption has been triggered.



##########
pkg/scheduler/objects/allocation.go:
##########
@@ -356,6 +357,20 @@ func (a *Allocation) GetAllocatedResource() *resources.Resource {
 	return a.allocatedResource
 }
 
+// SetPreempted marks the allocation as preempted.
+func (a *Allocation) SetPreempted(preempted bool) {

Review Comment:
   Should we log something at a `Debug` or even `Error` level if we try to set this flag when already set? I would consider it a big problem.



##########
pkg/scheduler/objects/allocation.go:
##########
@@ -356,6 +357,20 @@ func (a *Allocation) GetAllocatedResource() *resources.Resource {
 	return a.allocatedResource
 }
 
+// SetPreempted marks the allocation as preempted.
+func (a *Allocation) SetPreempted(preempted bool) {

Review Comment:
   I would highly doubt that we would ever unset this flag.
   When we trigger preemption we ask the shim to kill the pod. That is not going to be undone.



##########
pkg/scheduler/objects/application.go:
##########
@@ -1742,3 +1754,10 @@ func (sa *Application) SetTimedOutPlaceholder(taskGroupName string, timedOut int
 		sa.placeholderData[taskGroupName].TimedOut = timedOut
 	}
 }
+
+// test only
+func (sa *Application) SetPreemptionAttemptInterval(interval time.Duration) {

Review Comment:
   See earlier comment I think this is not needed at all.



##########
pkg/scheduler/objects/application.go:
##########
@@ -1128,6 +1135,7 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 }
 
 func (sa *Application) tryPreemption(reserve *reservation, ask *AllocationAsk) bool {
+	ask.SetLastPreemptionAttempt(time.Now())

Review Comment:
   We should not set this unless we have really preempted otherwise it looks like we did but we really did not. See comment above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] craigcondit commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
craigcondit commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1035278006


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 
 		// Do we need a specific node?
 		if ask.GetRequiredNode() != "" {
+			if ask.HasTriggeredPreemption() || time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   I'm not sure we want to do this. Once we have triggered preemption, the pods will be scheduled for termination, but may not yet have terminated (either gracefully or not). If graceful termination takes longer than 5 seconds, we're going to attempt to trigger again, which could result in evicting even more pods. In the worst case scenario, we could kill everything else running on the node.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] manirajv06 commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
manirajv06 commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1036872645


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 
 		// Do we need a specific node?
 		if ask.GetRequiredNode() != "" {
+			if ask.HasTriggeredPreemption() || time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   Yes, ensuring above three steps is good enough. If we anticipate any failures after marking victim as preempted (third step), we could break this into 1. victim_marked_for_preemption 2. victim_preempted (after core receive the terminated/killed confirmation from shim). This breakage would help us to retry only the victim staying in "victim_marked_for_preemption" state for so long.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
pbacsko commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1037164724


##########
pkg/scheduler/objects/allocation_ask.go:
##########
@@ -245,3 +247,31 @@ func (aa *AllocationAsk) GetAllocationLog() []*AllocationLogEntry {
 	}
 	return res
 }
+
+// SetTriggeredPreemption sets whether preemption has been triggered for this ask
+func (aa *AllocationAsk) SetTriggeredPreemption(triggered bool) {
+	aa.Lock()
+	defer aa.Unlock()
+	aa.preemptionTriggered = triggered
+}
+
+// HasTriggeredPreemption returns whether this ask has triggered preemption
+func (aa *AllocationAsk) HasTriggeredPreemption() bool {
+	aa.RLock()
+	defer aa.RUnlock()
+	return aa.preemptionTriggered
+}
+
+// SetLastPreemptionAttempt sets the time which tells when the last preemption attempt was performed for this ask.
+func (aa *AllocationAsk) SetLastPreemptionAttempt(attempt time.Time) {

Review Comment:
   Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
pbacsko commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1035167221


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 
 		// Do we need a specific node?
 		if ask.GetRequiredNode() != "" {
+			if ask.HasTriggeredPreemption() || time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   This check also defends against the scenario when there's no victim and we want to re-attempt preemption (I picked an arbitrary value of 5 seconds). I don't know if this is a legit case or not, if not, then it can be removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] craigcondit commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
craigcondit commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1037504089


##########
pkg/scheduler/objects/allocation_ask.go:
##########
@@ -262,16 +261,9 @@ func (aa *AllocationAsk) HasTriggeredPreemption() bool {
 	return aa.preemptionTriggered
 }
 
-// SetLastPreemptionAttempt sets the time which tells when the last preemption attempt was performed for this ask.
-func (aa *AllocationAsk) SetLastPreemptionAttempt(attempt time.Time) {
+// test only

Review Comment:
   This is only called from the objects pkg, so can be moved to a *_test.go file. No need to pollute the main package with test code.



##########
pkg/scheduler/objects/application.go:
##########
@@ -1147,13 +1140,13 @@ func (sa *Application) tryPreemption(reserve *reservation, ask *AllocationAsk) b
 	// Are there any victims/asks to preempt?
 	victims := preemptor.GetVictims()
 	if len(victims) > 0 {
+		log.Logger().Info("Found victims for daemon set ask preemption ",

Review Comment:
   Given we've renamed this feature to 'required node preemption', can we change this message to reflect that? Maybe "Found victims for required node preemption "



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] craigcondit commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
craigcondit commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1036462318


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 
 		// Do we need a specific node?
 		if ask.GetRequiredNode() != "" {
+			if ask.HasTriggeredPreemption() || time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   Ok, I think I understand now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
pbacsko commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1037939018


##########
pkg/scheduler/objects/application.go:
##########
@@ -1147,13 +1140,13 @@ func (sa *Application) tryPreemption(reserve *reservation, ask *AllocationAsk) b
 	// Are there any victims/asks to preempt?
 	victims := preemptor.GetVictims()
 	if len(victims) > 0 {
+		log.Logger().Info("Found victims for daemon set ask preemption ",

Review Comment:
   Done



##########
pkg/scheduler/objects/allocation_ask.go:
##########
@@ -262,16 +261,9 @@ func (aa *AllocationAsk) HasTriggeredPreemption() bool {
 	return aa.preemptionTriggered
 }
 
-// SetLastPreemptionAttempt sets the time which tells when the last preemption attempt was performed for this ask.
-func (aa *AllocationAsk) SetLastPreemptionAttempt(attempt time.Time) {
+// test only

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] craigcondit commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
craigcondit commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1036462516


##########
pkg/scheduler/objects/application.go:
##########
@@ -1742,3 +1754,10 @@ func (sa *Application) SetTimedOutPlaceholder(taskGroupName string, timedOut int
 		sa.placeholderData[taskGroupName].TimedOut = timedOut
 	}
 }
+
+// test only
+func (sa *Application) SetPreemptionAttemptInterval(interval time.Duration) {

Review Comment:
   Yes, I like the rename idea, I think that's clearer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
pbacsko commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1035748068


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 
 		// Do we need a specific node?
 		if ask.GetRequiredNode() != "" {
+			if ask.HasTriggeredPreemption() || time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   The is exactly what this change is intended to prevent, ie. once preemption has been triggered for a given ask, it should not be re-tried again - that's why we mark requests. Currently, it's possible that we try to terminate the same set of allocation a dozen times in a second.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
pbacsko commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1035754955


##########
pkg/scheduler/objects/application.go:
##########
@@ -1742,3 +1754,10 @@ func (sa *Application) SetTimedOutPlaceholder(taskGroupName string, timedOut int
 		sa.placeholderData[taskGroupName].TimedOut = timedOut
 	}
 }
+
+// test only
+func (sa *Application) SetPreemptionAttemptInterval(interval time.Duration) {

Review Comment:
   Maybe the naming `PreemptionRetryInterval` would be better?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] wilfred-s commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1036955320


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 
 		// Do we need a specific node?
 		if ask.GetRequiredNode() != "" {
+			if ask.HasTriggeredPreemption() || time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   The delete of the pod should never fail. If it does there is a problem on the node. A pod that stays in terminating state  for instance will not get removed if we just retry the removal. Sending a second delete will just be ignored by the API server as it is already processing a request like it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org