You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2022/12/01 06:05:31 UTC

[GitHub] [yunikorn-core] wilfred-s commented on a diff in pull request #458: [YUNIKORN-1434] Mark allocation as being preempted

wilfred-s commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1036719716


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 
 		// Do we need a specific node?
 		if ask.GetRequiredNode() != "" {
+			if ask.HasTriggeredPreemption() || time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   I think @craigcondit was right: once we triggered preemption the kills will happen and cannot be reversed. We do not re-trigger *ever* in any case not even after a short period of time.
   The idea is to find a victim list and if we have a list:
   -  mark the ask as preemption triggered 
   - kill all victims
   - mark all victims as preempted
   
   Next scheduling cycle if the ask is still there and has the preemption triggered flag skip it and do not re-trigger. The node is reserved for the ask so it will get scheduled on that node. It could take up to the grace time for the pod to shutdown, which could be loooong in scheduling times.



##########
pkg/scheduler/objects/allocation_ask.go:
##########
@@ -245,3 +247,31 @@ func (aa *AllocationAsk) GetAllocationLog() []*AllocationLogEntry {
 	}
 	return res
 }
+
+// SetTriggeredPreemption sets whether preemption has been triggered for this ask
+func (aa *AllocationAsk) SetTriggeredPreemption(triggered bool) {
+	aa.Lock()
+	defer aa.Unlock()
+	aa.preemptionTriggered = triggered
+}
+
+// HasTriggeredPreemption returns whether this ask has triggered preemption
+func (aa *AllocationAsk) HasTriggeredPreemption() bool {
+	aa.RLock()
+	defer aa.RUnlock()
+	return aa.preemptionTriggered
+}
+
+// SetLastPreemptionAttempt sets the time which tells when the last preemption attempt was performed for this ask.
+func (aa *AllocationAsk) SetLastPreemptionAttempt(attempt time.Time) {

Review Comment:
   I would expect that the method would generate the time by calling `time.Now()` instead of passing in the pre-created object.
   
   If this was done to allow resetting the time during tests we should add a method that allows clearing both time and the flag in one call. There could even be an argument for not having the boolean: if the time is set the preemption has been triggered.



##########
pkg/scheduler/objects/allocation.go:
##########
@@ -356,6 +357,20 @@ func (a *Allocation) GetAllocatedResource() *resources.Resource {
 	return a.allocatedResource
 }
 
+// SetPreempted marks the allocation as preempted.
+func (a *Allocation) SetPreempted(preempted bool) {

Review Comment:
   Should we log something at a `Debug` or even `Error` level if we try to set this flag when already set? I would consider it a big problem.



##########
pkg/scheduler/objects/allocation.go:
##########
@@ -356,6 +357,20 @@ func (a *Allocation) GetAllocatedResource() *resources.Resource {
 	return a.allocatedResource
 }
 
+// SetPreempted marks the allocation as preempted.
+func (a *Allocation) SetPreempted(preempted bool) {

Review Comment:
   I would highly doubt that we would ever unset this flag.
   When we trigger preemption we ask the shim to kill the pod. That is not going to be undone.



##########
pkg/scheduler/objects/application.go:
##########
@@ -1742,3 +1754,10 @@ func (sa *Application) SetTimedOutPlaceholder(taskGroupName string, timedOut int
 		sa.placeholderData[taskGroupName].TimedOut = timedOut
 	}
 }
+
+// test only
+func (sa *Application) SetPreemptionAttemptInterval(interval time.Duration) {

Review Comment:
   See earlier comment I think this is not needed at all.



##########
pkg/scheduler/objects/application.go:
##########
@@ -1128,6 +1135,7 @@ func (sa *Application) tryReservedAllocate(headRoom *resources.Resource, nodeIte
 }
 
 func (sa *Application) tryPreemption(reserve *reservation, ask *AllocationAsk) bool {
+	ask.SetLastPreemptionAttempt(time.Now())

Review Comment:
   We should not set this unless we have really preempted otherwise it looks like we did but we really did not. See comment above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org