You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Weiwei Yang (Jira)" <ji...@apache.org> on 2021/03/15 17:02:00 UTC
[jira] [Updated] (YUNIKORN-574) Wait for placeholder cleanup

     [ https://issues.apache.org/jira/browse/YUNIKORN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weiwei Yang updated YUNIKORN-574:
---------------------------------
        Parent: YUNIKORN-553
    Issue Type: Sub-task  (was: Bug)

> Wait for placeholder cleanup
> ----------------------------
>
>                 Key: YUNIKORN-574
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-574
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - scheduler
>            Reporter: Wilfred Spiegelenburg
>            Priority: Critical
>
> When we cleanup the application in the {{timeoutPlaceholderProcessing()}} we have two cases.
>  * First case we clean up all lingering placeholder allocations on the running app
>  * Second case is the fail of the which cleans up lingering asks no response needed from the shim) and all placeholders after which we fail the app.
> The cleanup of the placeholders in both these cases are instigated by the core and we need to wait for the cleanup to happen on the shim side before we proceed. It is not like the remove of the app signalled by the RM. This comes as an unexpected request for the shim not when the app is deleted on the shim side.
> For case 1 we do not have a problem. The placeholders are terminated and the app runs as per normal and is not moved to Completed  until all is finished.  We do NOT have an issue in the states leading to Completed as we have already handled it there (see below)
> For the failure case we immediately unlink the queue as we move into the FAILED state. As the move calls the {{moveTerminatedApp()}} via the callback. That causes an issue. We should be waiting for the shim to respond back to the core with the confirmation of the removal.
> This might require a new state to do this in two steps: trigger the cleanup move to Failing state, when all is cleaned up move to Failed.
> BTW: introducing a new state for Failing should also include the rename of Waiting to Completing as that is inline with what the state does and lines up between the two final states. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org