You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/03/23 00:42:52 UTC

[GitHub] [incubator-yunikorn-core] wilfred-s commented on a change in pull request #261: [YUNIKORN-586] enhance placeholder cleanup on timeout

wilfred-s commented on a change in pull request #261:
URL: https://github.com/apache/incubator-yunikorn-core/pull/261#discussion_r599167454



##########
File path: pkg/scheduler/objects/application.go
##########
@@ -266,12 +266,29 @@ func (sa *Application) timeoutPlaceholderProcessing() {
 	defer sa.Unlock()
 	switch {
 	// Case 1: if all app's placeholders are allocated, only part of them gets replaced, just delete the remaining placeholders
-	case (sa.IsRunning() || sa.IsStarting()) && !resources.IsZero(sa.allocatedPlaceholder):
+	case (sa.IsRunning() || sa.IsStarting() || sa.IsCompleting()) && !resources.IsZero(sa.allocatedPlaceholder):

Review comment:
       > If the application is in completing state that means that the completing timer is started, so if it will timeout right after the placeholder timer, we might send the same allocation release request to the shim twice.
   
   There is no reason to assume that the timeouts will . The placeholder timer could time out 1 sec after entering the completing state.
   
   What this check does is that it only removes placeholders that have not been replaced yet. If we move into the _Completing_ state we can do that with placeholder still allocated on the app. Any of those placeholders that have been marked as `released` have been communicated to the shim already for a replacement. They have a real allocation linked to it which is not confirmed yet as the core is waiting for the shim to confirm the release. If the replacement comes back from the shim we add the real allocation to the application. At that point we move back out of _Completing_ and we move back into the _Running_ state. As soon as that happens the timing out of the state _Completing_ will become a noop as the state is not the expected state.
   
   That is the specific case we're trying to solve. The case 1 comment should be updated to cover all the use cases properly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org