You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by wi...@apache.org on 2023/05/11 01:48:01 UTC

[yunikorn-site] branch master updated: [YUNIKORN-1506] Chinese translation of Gang scheduling (#294)

This is an automated email from the ASF dual-hosted git repository.

wilfreds pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-site.git


The following commit(s) were added to refs/heads/master by this push:
     new 17a049984 [YUNIKORN-1506] Chinese translation of Gang scheduling (#294)
17a049984 is described below

commit 17a0499842d43c2755ac26951c92a86a3b25841e
Author: KatLantyss <re...@gmail.com>
AuthorDate: Thu May 11 11:47:13 2023 +1000

    [YUNIKORN-1506] Chinese translation of Gang scheduling (#294)
    
    Closes: #294
    
    Signed-off-by: Wilfred Spiegelenburg <wi...@apache.org>
---
 .../current/user_guide/gang_scheduling.md          | 197 +++++++++++----------
 1 file changed, 99 insertions(+), 98 deletions(-)

diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/current/user_guide/gang_scheduling.md b/i18n/zh-cn/docusaurus-plugin-content-docs/current/user_guide/gang_scheduling.md
index 8a27522b5..754e29c32 100644
--- a/i18n/zh-cn/docusaurus-plugin-content-docs/current/user_guide/gang_scheduling.md
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/current/user_guide/gang_scheduling.md
@@ -1,6 +1,6 @@
 ---
 id: gang_scheduling
-title: Gang Scheduling
+title: 分组调度
 ---
 
 <!--
@@ -22,107 +22,107 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## What is Gang Scheduling
+##  什么是分组调度
 
-When Gang Scheduling is enabled, YuniKorn schedules the app only when
-the app’s minimal resource request can be satisfied. Otherwise, apps
-will be waiting in the queue. Apps are queued in hierarchy queues,
-with gang scheduling enabled, each resource queue is assigned with the
-maximum number of applications running concurrently with min resource guaranteed.
+当分组调度被启用时,YuniKorn 只在以下情况下调度应用程序
+应用程序的最小资源请求能够得到满足时才会调度。否则,应用程序
+将在队列中等待。应用程序被排在层次队列中、
+启用分组调度后,每个资源队列都被分配有
+在保证最小资源的情况下,每个资源队列都被分配了最大数量的应用程序并发运行。
 
-![Gang Scheduling](./../assets/gang_scheduling_intro.png)
+![分组调度](./../assets/gang_scheduling_intro.png)
 
-## Enable Gang Scheduling
+## 启用分组调度
 
-There is no cluster-wide configuration needed to enable Gang Scheduling.
-The scheduler actively monitors the metadata of each app, if the app has included
-a valid taskGroups definition, it will be considered as gang scheduling desired.
+启用分组调度不需要在集群范围内进行配置。
+调度器会主动监控每个应用的元数据,如果该应用包含了
+一个有效的任务组定义,它就会被认为是所需的分组调度。
 
-:::info Task Group
-A task group is a “gang” of tasks in an app, these tasks are having the same resource profile
-and the same placement constraints. They are considered as homogeneous requests that can be
-treated as the same kind in the scheduler.
+:::info 任务组
+一个任务组是一个应用程序中的任务 "群组",这些任务具有相同的资源概况
+和相同的放置限制。它们被认为是同质化的请求,在调度器中可以被视为
+在调度器中被当作同一类请求处理。
 :::
 
-### Prerequisite
+### 前提
 
-For the queues which runs gang scheduling enabled applications, the queue sorting policy should be set to `FIFO`.
-To configure queue sorting policy, please refer to doc: [app sorting policies](user_guide/sorting_policies.md#application-sorting).
+对于运行支持调度的应用程序的队列,队列排序策略应该被设置为 "FIFO"。
+要配置队列排序策略,请参考文档:[应用程序排序](user_guide/sorting_policies.md#应用程序排序)。
 
-#### Why the `FIFO` sorting policy
+#### 为什么要使用 "FIFO" 排序策略?
 
-When Gang Scheduling is enabled, the scheduler proactively reserves resources
-for each application. If the queue sorting policy is not FIFO based (StateAware is FIFO based sorting policy),
-the scheduler might reserve partial resources for each app and causing resource segmentation issues.
+当分组调度被启用时,调度器会主动为每个应用程序保留资源。
+如果队列排序策略不是基于FIFO(StateAware 是基于 FIFO 的排序策略)、
+调度器可能会为每个应用保留部分资源并导致资源分割问题。
 
-#### Side effects of `StateAware` sorting policy
+#### "StateAware" 排序策略的副作用
 
-We do not recommend using `StateAware`, even-though it is a FIFO based policy. A failure of the first pod or a long initialisation period of that pod could slow down the processing.
-This is specifically an issue with Spark jobs when the driver performs a lot of pre-processing before requesting the executors.
-The `StateAware` timeout in those cases would slow down processing to just one application per timeout.
-This in effect will overrule the gang reservation and cause slowdowns and excessive resource usage.
+我们不建议使用 `StateAware`,尽管它是一个基于 FIFO 的策略。第一个 pod 的失败或者该 pod 的初始化时间过长都会使处理过程变慢。
+当驱动在请求执行器之前进行大量的预处理时,尤其是 Spark。
+在这些情况下,`StateAware` 超时会使处理速度减慢到每次超时只有一个应用程序。
+这实际上会推翻群组的保留,并导致减速和过度使用资源。
 
-### App Configuration
+### 应用程序配置
 
-On Kubernetes, YuniKorn discovers apps by loading metadata from individual pod, the first pod of the app
-is required to enclosed with a full copy of app metadata. If the app does not have any notion about the first or second pod,
-then all pods are required to carry the same taskGroups info. Gang scheduling requires taskGroups definition,
-which can be specified via pod annotations. The required fields are:
+在 Kubernetes 上,YuniKorn 通过从单个pod加载元数据来发现应用,应用的第一个pod
+被要求附上一份完整的应用元数据副本。如果应用程序没有关于第一个或第二个pod的任何注释,
+那么所有的pod都需要携带相同的 taskGroups 信息。分组調度需要 taskGroups 的定义,
+可以通过 pod 注解来指定。所需的字段是:
 
-| Annotation                                     | Value                                                                                                                                                         |
-|------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| yunikorn.apache.org/task-group-name 	          | Task group name, it must be unique within the application                                                                                                     |
-| yunikorn.apache.org/task-groups                | A list of task groups, each item contains all the info defined for the certain task group                                                                     |
-| yunikorn.apache.org/schedulingPolicyParameters | Optional. A arbitrary key value pairs to define scheduling policy parameters. Please read [schedulingPolicyParameters section](#scheduling-policy-parameters) |
+| 注释 | 价值 |
+|-----|------|
+| yunikorn.apache.org/task-group-name 	         | 任务组名称,在应用程序中必须是唯一的。                                  |
+| yunikorn.apache.org/task-groups                | 任务组的列表,每一项都包含了为该任务组定义的所有信息。                     |
+| yunikorn.apache.org/schedulingPolicyParameters | 可选。一个任意的键值对来定义调度策略参数。请阅读[调度策略参数](#调度策略参数)  |
 
-#### How many task groups needed?
+#### 需要多少个任务组?
 
-This depends on how many different types of pods this app requests from K8s. A task group is a “gang” of tasks in an app,
-these tasks are having the same resource profile and the same placement constraints. They are considered as homogeneous
-requests that can be treated as the same kind in the scheduler. Use Spark as an example, each job will need to have 2 task groups,
-one for the driver pod and the other one for the executor pods.
+这取决于这个应用程序向 K8s 请求多少不同类型的 pod。一个任务组是一个应用程序中的任务 "群组"、
+这些任务具有相同的资源概况和相同的放置限制。它们被认为是同质的
+在调度器中可以被视为同类的请求。以 Spark 为例,每个作业都需要有2个任务组、
+一个用于driver pod,另一个用于 executor pods。
 
-#### How to define task groups?
+#### 如何定义任务组?
 
-The task group definition is a copy of the app’s real pod definition, values for fields like resources, node-selector, toleration
-and affinity should be the same as the real pods. This is to ensure the scheduler can reserve resources with the
-exact correct pod specification.
+任务组的定义是应用程序的真实pod定义的副本,像资源、节点选择器、容忍度
+和亲和力应该与真正的pod相同。这是为了确保调度器能够以准确的pod规格保留资源。
+确切正确的pod规范。
 
-#### Scheduling Policy Parameters
+#### 调度策略参数
 
-Scheduling policy related configurable parameters. Apply the parameters in the following format in pod's annotation:
+调度策略相关的可配置参数。在pod的注释中以下列格式应用这些参数:
 
 ```yaml
 annotations:
    yunikorn.apache.org/schedulingPolicyParameters: "PARAM1=VALUE1 PARAM2=VALUE2 ..."
 ```
 
-Currently, the following parameters are supported:
+目前,支持以下参数:
 
 `placeholderTimeoutInSeconds`
 
-Default value: *15 minutes*.
-This parameter defines the reservation timeout for how long the scheduler should wait until giving up allocating all the placeholders.
-The timeout timer starts to tick when the scheduler *allocates the first placeholder pod*. This ensures if the scheduler
-could not schedule all the placeholder pods, it will eventually give up after a certain amount of time. So that the resources can be
-freed up and used by other apps. If non of the placeholders can be allocated, this timeout won't kick-in. To avoid the placeholder
-pods stuck forever, please refer to [troubleshooting](troubleshooting.md#成组调度) for solutions.
+默认值: *15分钟*。
+这个参数定义了预约超时,即调度器在放弃分配所有占位符之前应该等待多长时间。
+当调度器*分配第一个占位器pod*时,超时计时器开始计时。这确保了如果调度器
+无法调度所有的占位荚,它最终会在一定的时间后放弃。这样,资源可以被
+释放出来,供其他应用程序使用。如果没有占位符可以被分配,这个超时就不会启动。为了避免占位符
+pods永远卡住,请参考 [故障排除](troubleshooting.md#成组调度) 了解解决方案。
 
-` gangSchedulingStyle`
+`gangSchedulingStyle`
 
-Valid values: *Soft*, *Hard*
+有效值: *Soft*, *Hard*
 
-Default value: *Soft*.
-This parameter defines the fallback mechanism if the app encounters gang issues due to placeholder pod allocation.
-See more details in [Gang Scheduling styles](#gang-scheduling-styles) section
+默认值:*Soft*.
+这个参数定义了当应用程序由于占位符 pod 分配而遇到分组问题时的后退机制。
+更多细节见[分组调度风格](#分组调度风格)部分
 
-More scheduling parameters will added in order to provide more flexibility while scheduling apps.
+更多的调度参数将被添加,以便在调度应用程序时提供更多的灵活性。
 
-#### Example
+#### 示例
 
-The following example is a yaml file for a job. This job launches 2 pods and each pod sleeps 30 seconds.
-The notable change in the pod spec is *spec.template.metadata.annotations*, where we defined `yunikorn.apache.org/task-group-name`
-and `yunikorn.apache.org/task-groups`.
+下面的例子是一个工作的yaml文件。这个工作启动了2个 pod,每个 pod 睡眠时间为 30 秒。
+在 pod 规范中值得注意的变化是 *spec.template.metadata.annotations*,在这里我们定义了 `yunikorn.apache.org/task-group-name
+和 `yunikorn.apache.org/task-groups` 。
 
 ```yaml
 apiVersion: batch/v1
@@ -165,18 +165,19 @@ spec:
               memory: "50M"
 ```
 
-When this job is submitted to Kubernetes, 2 pods will be created using the same template, and they all belong to one taskGroup:
-*“task-group-example”*. YuniKorn will create 2 placeholder pods, each uses the resources specified in the taskGroup definition.
-When all 2 placeholders are allocated, the scheduler will bind the the real 2 sleep pods using the spot reserved by the placeholders.
+当这项工作提交给 Kubernetes 时,将使用同一模板创建2个pod,它们都属于一个任务组:
+*"task-group-example"*。 YuniKorn将创建2个占位符pod,每个都使用任务组定义中指定的资源。
+当所有2个占位符分配完毕后,调度器将使用占位符保留的位置来绑定真正的2个睡眠pods。
+
+如果有必要,你可以添加多个任务组,每个任务组由任务组名称标识、
+通过设置任务组名称,需要将每个真实的pod与一个预先定义的任务组进行映射。注意、
+任务组名称只要求在一个应用程序中是唯一的。
 
-You can add more than one taskGroups if necessary, each taskGroup is identified by the taskGroup name,
-it is required to map each real pod with a pre-defined taskGroup by setting the taskGroup name. Note,
-the task group name is only required to be unique within an application.
 
-### Enable Gang scheduling for Spark jobs
+### 启用Spark作业的Gang调度
 
-Each Spark job runs 2 types of pods, driver and executor. Hence, we need to define 2 task groups for each job.
-The annotations for the driver pod looks like:
+每个Spark作业都运行2种类型的pod,驱动和执行器。因此,我们需要为每个作业定义2个任务组。
+驱动器pod的注释看起来像:
 
 ```yaml
 Annotations:
@@ -208,32 +209,32 @@ Annotations:
 ```
 
 :::note
-The TaskGroup resources must account for the memory overhead for Spark drivers and executors.
-See the [Spark documentation](https://spark.apache.org/docs/latest/configuration.html#application-properties) for details on how to calculate the values.
+任务组的资源必须考虑到Spark驱动和执行器的内存开销。
+参见 [Spark documentation](https://spark.apache.org/docs/latest/configuration.html#application-properties) 以了解如何计算这些数值的细节。
 :::
 
-For all the executor pods,
+对于所有的执行者 pod,
 
 ```yaml
 Annotations:
-  # the taskGroup name should match to the names
-  # defined in the taskGroups field
+  # 任务组字段中定义的名称相匹配
+  # 在任务组字段中定义
   yunikorn.apache.org/taskGroupName: “spark-executor”
 ```
 
-Once the job is submitted to the scheduler, the job won’t be scheduled immediately.
-Instead, the scheduler will ensure it gets its minimal resources before actually starting the driver/executors. 
+一旦工作被提交给调度器,工作就不会被立即调度。
+相反,在实际启动驱动/执行器之前,调度器将确保它获得最小的资源。
 
-## Gang scheduling Styles
+## 分组调度风格
 
-There are 2 gang scheduling styles supported, Soft and Hard respectively. It can be configured per app-level to define how the app will behave in case the gang scheduling fails.
+有2种分组调度方式支持,分别是 Soft 和 Hard。它可以在每个应用层面进行配置,以定义应用在分组调度失败时的行为。
 
-- `Hard style`: when this style is used, we will have the initial behavior, more precisely if the application cannot be scheduled according to gang scheduling rules, and it times out, it will be marked as failed, without retrying to schedule it.
-- `Soft style`: when the app cannot be gang scheduled, it will fall back to the normal scheduling, and the non-gang scheduling strategy will be used to achieve the best-effort scheduling. When this happens, the app transits to the Resuming state and all the remaining placeholder pods will be cleaned up.
+- `Hard Style`:当使用这种风格时,我们将有初始行为,更确切地说,如果应用程序不能根据分组调度规则进行调度,并且超时,它将被标记为失败,而不会重新尝试调度。
+- `Soft Style`:当应用程序不能被分组调度时,它将退回到正常的调度,并使用非分组调度策略来实现最佳努力的调度。当这种情况发生时,应用程序将过渡到恢复状态,所有剩余的占位符 pod 将被清理掉。
 
-**Default style used**: `Soft`
+**使用的默认样式**: ``Soft。
 
-**Enable a specific style**: the style can be changed by setting in the application definition the ‘gangSchedulingStyle’ parameter to Soft or Hard.
+**启用一个特定的风格**:可以通过在应用程序定义中设置'gangSchedulingStyle'参数来改变风格,即 Soft 或 Hard。
 
 #### Example
 
@@ -281,16 +282,16 @@ spec:
 
 ```
 
-## Verify Configuration
+## 验证配置
 
-To verify if the configuration has been done completely and correctly, check the following things:
-1. When an app is submitted, verify the expected number of placeholders are created by the scheduler.
-If you define 2 task groups, 1 with minMember 1 and the other with minMember 5, that means we are expecting 6 placeholder
-gets created once the job is submitted.
-2. Verify the placeholder spec is correct. Each placeholder needs to have the same info as the real pod in the same taskGroup.
-Check field including: namespace, pod resources, node-selector, toleration and affinity.
-3. Verify the placeholders can be allocated on correct type of nodes, and verify the real pods are started by replacing the placeholder pods.
+为了验证配置是否已经完全正确,请检查以下事项:
+1. 当一个应用程序被提交时,验证预期的占位符数量是否被调度器创建。
+如果你定义了两个任务组,一个是minMember 1,另一个是minMember 5,这意味着我们期望在任务提交后有6个占位符被创建。
+被创建。
+2. 验证占位符的规格是否正确。每个占位符都需要有与同一任务组中的真实pod相同的信息。
+检查领域包括:命名空间、pod资源、节点选择器、容忍度和亲和力。
+3. 验证占位符可以分配到正确的节点类型上,并验证真正的pod是通过替换占位符pod而启动的。
 
-## Troubleshooting
+## 故障排除
 
-Please see the troubleshooting doc when gang scheduling is enabled [here](troubleshooting.md#成组调度).
+请参阅启用帮派调度时的故障排除文档 [這裡](troubleshooting.md#成组调度)