You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Chaoran Yu (Jira)" <ji...@apache.org> on 2021/05/20 16:04:00 UTC

[jira] [Commented] (YUNIKORN-582) Consider a fallback mechanism to schedule the app in case of gang failure instead of marking the app as failed

    [ https://issues.apache.org/jira/browse/YUNIKORN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348605#comment-17348605 ] 

Chaoran Yu commented on YUNIKORN-582:
-------------------------------------

I vote for this issue. The current behavior results in an unpleasant user experience and is different from most users' expectations. In Yarn, jobs will keep waiting in the queue until resources are freed up.

> Consider a fallback mechanism to schedule the app in case of gang failure instead of marking the app as failed
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-582
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-582
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - scheduler
>            Reporter: Ayub Pathan
>            Assignee: Manikandan R
>            Priority: Major
>
> Incases when the app encounters gang issues due to placeholder pod allocation(failed due to various reasons), currently yunikorn marks the app failed. 
> Instead, consider a configurable option for hard or soft gang scheduling which allows fallback mechanism to schedule the app successfully.  This needs to be brain stormed to see if this makes sense. Let us use this jira for documenting all the thoughts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org