You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@yunikorn.apache.org by "Chaoran Yu (Jira)" <ji...@apache.org> on 2021/04/25 18:47:00 UTC

[jira] [Created] (YUNIKORN-657) Expose reason of application failure to pods

Chaoran Yu created YUNIKORN-657:
-----------------------------------

             Summary: Expose reason of application failure to pods
                 Key: YUNIKORN-657
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-657
             Project: Apache YuniKorn
          Issue Type: Improvement
          Components: shim - kubernetes
            Reporter: Chaoran Yu
            Assignee: Chaoran Yu


An application may fail for a number of reasons. For example,
* In gang scheduling, placeholders have expired before all of them can be successfully allocated
* When no placement rules are defined (i.e. static queues are used), an application is submitted to an non-existent queue
* The total amount of resources requested by a gang-scheduled app exceeds the capacity of the queue
YK's the finite state machine has Failed as a terminal state of an app, meaning that YK won't try to bring back a failed app ever again. The consequence is that pods of such failed apps will be stuck in pending indefinitely. A better behavior is for YK to mark those pods as failed too, while also passing the reason of the failure to those pods.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org