You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Wilfred Spiegelenburg (Jira)" <ji...@apache.org> on 2023/08/04 12:00:01 UTC

[jira] [Created] (YUNIKORN-1900) Orphan allocation due to placeholder deletes

Wilfred Spiegelenburg created YUNIKORN-1900:
-----------------------------------------------

             Summary: Orphan allocation due to placeholder deletes
                 Key: YUNIKORN-1900
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1900
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
            Reporter: Wilfred Spiegelenburg
            Assignee: Wilfred Spiegelenburg


Gang scheduled applications can leave orphaned allocations. The reason this can happen is that the gang scheduling setup is only specifying one taskgroup with one member for the app.
This by itself is not a problem and works. A replacement of the placeholder with the real allocation triggers the issue. It temporarily removes all allocations and with only 1 gang member leaves no pending asks. That is the trigger for the state change of the application to COMPLETING. This is correct state change for the app if nothing is left, no allocations or asks.

Triggering the state change is however a problem. If the allocation of the driver would not be a replacement the COMPLETING application moves to RUNNING via a state update. We trigger a state change in that case and the issue does not occur. For placeholder replacements we trigger the state change, if needed, on the removal of the placeholder. Not when the real allocation is confirmed.

If the confirmation is processed before the COMPLETING state times out the allocation is added to the node and never cleaned up. When the COMPLETING state times out the application gets removed without the cleanup of the allocation.

The allocation cleanup does not get triggered as the COMPLETING state should never be entered with allocations on the app.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org