You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/06/29 00:29:52 UTC

[GitHub] [incubator-yunikorn-site] cheersyang commented on a change in pull request #61: [YUNIKORN-728] Document Soft/Hard scheduling styles

cheersyang commented on a change in pull request #61:
URL: https://github.com/apache/incubator-yunikorn-site/pull/61#discussion_r660195875



##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -206,6 +214,64 @@ Annotations:
 Once the job is submitted to the scheduler, the job won’t be scheduled immediately.
 Instead, the scheduler will ensure it gets its minimal resources before actually starting the driver/executors. 
 
+## Gang scheduling Styles
+
+Initially when the app encountered gang issues due to placeholder pod allocation(failed due to various reasons), we marked the application failed without retrying it. This wasn’t a really user friendly experience, so it led to a demand of making the gangs scheduling style configurable and make it possible to succeed to schedule the app through a fallback mechanism.
+
+To solve this issue we defined two Gang scheduling styles: Soft and Hard.

Review comment:
       I feel we can simplify this to shorter lines, such as "there are 2 gang scheduling styles supported, Soft and Hard respectively. It can be configured per app-level to define how the app will behave in case the gang scheduling fails."

##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -206,6 +214,64 @@ Annotations:
 Once the job is submitted to the scheduler, the job won’t be scheduled immediately.
 Instead, the scheduler will ensure it gets its minimal resources before actually starting the driver/executors. 
 
+## Gang scheduling Styles
+
+Initially when the app encountered gang issues due to placeholder pod allocation(failed due to various reasons), we marked the application failed without retrying it. This wasn’t a really user friendly experience, so it led to a demand of making the gangs scheduling style configurable and make it possible to succeed to schedule the app through a fallback mechanism.
+
+To solve this issue we defined two Gang scheduling styles: Soft and Hard.
+
+- `Hard style`: when this style is used, we will have the initial behavior, more precisely if the application cannot be scheduled according to gang scheduling rules, and it times out, it will be marked as failed, without retrying to schedule it.

Review comment:
       when the app cannot be gang scheduled, it will be marked as failed without retrying to schedule it.

##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -101,6 +101,14 @@ could not schedule all the placeholder pods, it will eventually give up after a
 freed up and used by other apps. If non of the placeholders can be allocated, this timeout won't kick-in. To avoid the placeholder
 pods stuck forever, please refer to [troubleshooting](trouble_shooting.md#gang-scheduling) for solutions.
 
+` gangSchedulingStyle`
+
+Possible values: *Soft*, *Hard*

Review comment:
       Possible values -> Valid values

##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -206,6 +214,64 @@ Annotations:
 Once the job is submitted to the scheduler, the job won’t be scheduled immediately.
 Instead, the scheduler will ensure it gets its minimal resources before actually starting the driver/executors. 
 
+## Gang scheduling Styles
+
+Initially when the app encountered gang issues due to placeholder pod allocation(failed due to various reasons), we marked the application failed without retrying it. This wasn’t a really user friendly experience, so it led to a demand of making the gangs scheduling style configurable and make it possible to succeed to schedule the app through a fallback mechanism.
+
+To solve this issue we defined two Gang scheduling styles: Soft and Hard.
+
+- `Hard style`: when this style is used, we will have the initial behavior, more precisely if the application cannot be scheduled according to gang scheduling rules, and it times out, it will be marked as failed, without retrying to schedule it.
+- `Soft style`: using this style will make it possible to schedule a gang application as a normal, simple application if it cannot be scheduled and started by following the gang scheduling rules. This means that in case of the placeholder timeout the placeholders will be deleted and the application state will transition to Resuming state. After all the placeholders are deleted, the application will transition into Accepted state and the app’s pods will be scheduled according to the non-gang application scheduling logic.

Review comment:
       when the app cannot be gang scheduled, it will fall back to the normal scheduling, and the non-gang scheduling strategy will be used to achieve the best-effort scheduling.  When this happens, the app transits to the Resuming state and all the remaining placeholder pods will be cleaned up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org