You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/04/06 09:38:59 UTC

[GitHub] [incubator-yunikorn-site] kingamarton commented on a change in pull request #41: [YUNIKORN-618] Add Gang scheduling user and troubleshooting doc

kingamarton commented on a change in pull request #41:
URL: https://github.com/apache/incubator-yunikorn-site/pull/41#discussion_r607687708



##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -0,0 +1,205 @@
+---
+id: gang_scheduling
+title: Gang Scheduling
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## What is Gang Scheduling
+
+When Gang Scheduling is enabled, YuniKorn schedules the app only when
+the app’s minimal resource request can be satisfied. Otherwise, apps
+will be waiting in the queue. Apps are queued in hierarchy queues,
+with gang scheduling enabled, each resource queue is assigned with the
+maximum number of applications running concurrently with min resource guaranteed.
+
+![Gang Scheduling](./../assets/gang_scheduling_iintro.png)
+
+:::note
+gang scheduling feature is only available in *0.10.0* and later releases.
+:::

Review comment:
       Since we have our documentation versioned, I think we don't need this note.

##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -0,0 +1,205 @@
+---
+id: gang_scheduling
+title: Gang Scheduling
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## What is Gang Scheduling
+
+When Gang Scheduling is enabled, YuniKorn schedules the app only when
+the app’s minimal resource request can be satisfied. Otherwise, apps
+will be waiting in the queue. Apps are queued in hierarchy queues,
+with gang scheduling enabled, each resource queue is assigned with the
+maximum number of applications running concurrently with min resource guaranteed.
+
+![Gang Scheduling](./../assets/gang_scheduling_iintro.png)
+
+:::note
+gang scheduling feature is only available in *0.10.0* and later releases.
+:::
+
+## Enable Gang Scheduling
+
+There is no cluster-wide configuration needed to enable Gang Scheduling.
+The scheduler actively monitors the metadata of each app, if the app has included
+a valid taskGroups definition, it will be considered as gang scheduling desired.
+
+:::info Task Group
+A task group is a “gang” of tasks in an app, these tasks are having the same resource profile
+and the same placement constraints. They are considered as homogeneous requests that can be
+treated as the same kind in the scheduler.
+:::
+
+### Prerequisite
+
+It is recommended to enable gang scheduling at least in the *queue level*, which means for
+each resource queue, either all the apps are gang scheduled, or non of the apps is gang scheduled.
+For the queues which have gang scheduling enabled, the queue sorting policy needs to be set either

Review comment:
       `For the queues which have gang scheduling enabled` I think this part is not so clear. It makes me to think that we can enable somewhere the gang scheduling for the queue. 

##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -0,0 +1,205 @@
+---
+id: gang_scheduling
+title: Gang Scheduling
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## What is Gang Scheduling
+
+When Gang Scheduling is enabled, YuniKorn schedules the app only when
+the app’s minimal resource request can be satisfied. Otherwise, apps
+will be waiting in the queue. Apps are queued in hierarchy queues,
+with gang scheduling enabled, each resource queue is assigned with the
+maximum number of applications running concurrently with min resource guaranteed.
+
+![Gang Scheduling](./../assets/gang_scheduling_iintro.png)
+
+:::note
+gang scheduling feature is only available in *0.10.0* and later releases.
+:::
+
+## Enable Gang Scheduling
+
+There is no cluster-wide configuration needed to enable Gang Scheduling.
+The scheduler actively monitors the metadata of each app, if the app has included
+a valid taskGroups definition, it will be considered as gang scheduling desired.
+
+:::info Task Group
+A task group is a “gang” of tasks in an app, these tasks are having the same resource profile
+and the same placement constraints. They are considered as homogeneous requests that can be
+treated as the same kind in the scheduler.
+:::
+
+### Prerequisite
+
+It is recommended to enable gang scheduling at least in the *queue level*, which means for
+each resource queue, either all the apps are gang scheduled, or non of the apps is gang scheduled.
+For the queues which have gang scheduling enabled, the queue sorting policy needs to be set either
+`FIFO` or `StateAware`. To configure queue sorting policy, please refer to doc: [app sorting policies](user_guide/sorting_policies.md#Application_sorting).
+
+:::info Why FIFO based sorting policy?
+When Gang Scheduling is enabled, the scheduler proactively reserves resources
+for each application. If the queue sorting policy is not FIFO based (StateAware is FIFO based sorting policy),
+the scheduler might reserve partial resources for each app and causing resource segmentation issues.
+:::
+
+### App Configuration
+
+On Kubernetes, YuniKorn discovers apps by loading metadata from individual pod, the first pod of the app
+is required to enclosed with a full copy of app metadata. If the app doesn’t have any notion about the first or second pod,
+then all pods are required to carry the same taskGroups info. Gang scheduling requires taskGroups definition,
+which can be specified via pod annotations. The required fields are:
+
+| Annotation                                     | Value |
+|----------------------------------------------- |---------------------	|
+| yunikorn.apache.org/task-group-name 	         | Task group name, it must be unique within the application |
+| yunikorn.apache.org/task-groups                | A list of task groups, each item contains all the info defined for the certain task group |
+| yunikorn.apache.org/schedulingPolicyParameters | Optional. A arbitrary key value pairs to define scheduling policy parameters. Currently only *placeholderTimeoutInSeconds* is supported. It defines the reservation timeout for how long the scheduler should wait until giving up allocating all the placeholders. |

Review comment:
       I think here we need to add some clarification related the timeout. It will not define the reservation timeout, since the timer is not started when the app enters the reserving state, but when we have the first placeholder allocation. Also we should mention it's default value.

##########
File path: docs/user_guide/trouble_shooting.md
##########
@@ -144,6 +144,43 @@ kubectl scale deployment yunikorn-scheduler -n yunikorn --replicas=0
 kubectl scale deployment yunikorn-scheduler -n yunikorn --replicas=1
 ```
 
+## Gang Scheduling
+
+### 1. No placeholders created, app's pods are pending
+
+*Reason*: This is usually because the taskGroups definition is invalid, and the app is rejected by the scheduler.

Review comment:
       It can be because the total resource request exceeds the queue resources. In this case the app is also rejected




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org