You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Wilfred Spiegelenburg (Jira)" <ji...@apache.org> on 2023/04/02 03:08:00 UTC

[jira] [Commented] (YUNIKORN-1670) Application recovery can fail if app is rejected

    [ https://issues.apache.org/jira/browse/YUNIKORN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707609#comment-17707609 ] 

Wilfred Spiegelenburg commented on YUNIKORN-1670:
-------------------------------------------------

After this change there is one fixed timeout left in the recovery process which is the creation of all allocations in the core. That call has a 5 minute timeout:
{code:java}
if err := ss.context.WaitForRecovery(recoverableAppManagers, 5*time.Minute); err != nil {code}
That is the part which lists all nodes and processes them.

It is probably not the slowest part at all. Can we follow up on that one also?

> Application recovery can fail if app is rejected
> ------------------------------------------------
>
>                 Key: YUNIKORN-1670
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1670
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>            Reporter: Craig Condit
>            Assignee: Craig Condit
>            Priority: Major
>              Labels: pull-request-available
>
> During application recovery, the current code waits up to 30 seconds for all applications to transition to "Accepted". However, if an application is rejected, or if the cluster is large enough, recovery will not succeed.
> Similar to how informer sync was recently updated, we should modify the logic to keep trying, but log periodically. Additionally, we should not look specifically for Accepted state, but for state != New and != Recovering. This ensures that we have processed all the applicaitons.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org