You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/08/30 09:42:30 UTC
[GitHub] [hadoop] 9uapaw commented on a change in pull request #3333: YARN-10576. Update Capacity Scheduler documentation about JSON-based …

9uapaw commented on a change in pull request #3333:
URL: https://github.com/apache/hadoop/pull/3333#discussion_r698269656



##########
File path: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
##########
@@ -288,7 +288,27 @@ Below example covers single mapping separately. In case of multiple mappings wit
 
 Rules are evaluated from top to bottom. Compared to the legacy mapping rule evaluator, it can be adjusted more flexibly what happens when the evaluation stops and a given rule does not match.
 
-  * Rules
+####How to enable JSON-based queue mapping
+
+The following properties control how the new placement engine expects rules.
+
+| Setting | Description |
+|:---- |:---- |
+| `yarn.scheduler.capacity.mapping-rule-format` | Allowed values are `legacy` or `json`. If it is not set, then the engine assumes that the old format might be in use so it also checks the value of `yarn.scheduler.capacity.queue-mappings`. Therefore, this must be set to `json` and cannot be left empty. |
+| `yarn.scheduler.capacity.mapping-rule-json` | The value of this property should contain the entire chain of rules inline. This is the preferred way of configuring Capacity Scheduler if you use the Mutation API, ie. modify configuration real-time via the REST interface. |
+| `yarn.scheduler.capacity.mapping-rule-json-file` | Defines an absolute path to a JSON file which contains the rules. For exmaple, `/opt/hadoop/config/mapping-rules.json`. |

Review comment:
       Nit: typo exmaple

##########
File path: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
##########
@@ -298,21 +318,28 @@ Rules are evaluated from top to bottom. Compared to the legacy mapping rule eval
 | `matches` | The string to match, or an asterisk "&ast;" which means "all". For example, if the type is `user` and this string is "hadoop" then the rule will only be evaluated if the submitter user is "hadoop". The "&ast;" does not work with groups. |
 | `policy` | Selects a list of pre-defined policies which defines where the application should be placed. This will be explained later in the "Policies" section. |
 | `parentQueue` | In case of `user`, `primaryGroup`, `primaryGroupUser`, `secondaryGroup`, `secondaryGroupUser` policies, this tells the engine where the matching queue should be looked for. For example, if the policy is `primaryGroup`, parent is `root.groups` and the submitter's group is "admins", then the resulting queue will be "root.groups.admin" |
-| `fallbackResult` | If the target queue does not exist or it cannot be created (ie. it exists under a regular parent), it defines a fallback action. Valid values are `skip`, `reject` and `placeDefault`. |
-| `create` | Only applies to managed queue parents. If set to "false", then the queue will not be created if it does not exist. |
+| `fallbackResult` | If the target queue does not exist or it cannot be created, it defines a fallback action. Valid values are `skip`, `reject` and `placeDefault`. |
+| `create` | If set to "false", then the queue will not be created if it does not exist. This flag works differently in flexible and in legacy mode (see below). |
 | `value` | If the policy is `setDefaultQueue`, then the default queue will change to this setting from "root.default". Otherwise ignored. |
 | `customPlacement` | Only works with `custom` placement policy. The value of this field will be evaluated directly by the engine, which means that various placeholders such as `%application` or `%primary_group` will be replaced with their respective values. |
 
 
   `type` is the equivalent of the first column in the old format. It is either "g" or "u" and there is a separate property for application mappings. `matches` is the second column. The only difference is that `%user` means to match all users, but it's not expressive enough. So in the new format, it's been changed to `*`.
   The `fallbackResult` setting is checked what to do when the target queue cannot be created or does not exist. The three settings work the following way:
+
 * `skip`: ignore the current rule and proceed to the next. This is how Fair Scheduler evaluates placement rules.
+
 * `placeDefault`: place the application to the default queue `root.default` (unless it's overridden to something else). This is how Capacity Scheduler works with the old mapping rules.
+
 * `reject`: rejects the submission.
 
-  The `create` flag has no effect on the queue if the parent is not managed.
+  The `create` flag is affected by the mode:
+
+* **Legacy** mode: it has no effect if the parent is not managed.
+
+* **Flexible** mode: it applies to all parent queues. However, if `yarn.scheduler.capacity.<queue-path>.auto-queue-creation-v2.enabled` is set to false the queue will not be created.

Review comment:
       I think it would be better to phrase it like: "Applies to all parent queues that have yarn.scheduler.capacity.<queue-path>.auto-queue-creation-v2.enabled set to true.

##########
File path: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
##########
@@ -298,21 +318,28 @@ Rules are evaluated from top to bottom. Compared to the legacy mapping rule eval
 | `matches` | The string to match, or an asterisk "&ast;" which means "all". For example, if the type is `user` and this string is "hadoop" then the rule will only be evaluated if the submitter user is "hadoop". The "&ast;" does not work with groups. |
 | `policy` | Selects a list of pre-defined policies which defines where the application should be placed. This will be explained later in the "Policies" section. |
 | `parentQueue` | In case of `user`, `primaryGroup`, `primaryGroupUser`, `secondaryGroup`, `secondaryGroupUser` policies, this tells the engine where the matching queue should be looked for. For example, if the policy is `primaryGroup`, parent is `root.groups` and the submitter's group is "admins", then the resulting queue will be "root.groups.admin" |
-| `fallbackResult` | If the target queue does not exist or it cannot be created (ie. it exists under a regular parent), it defines a fallback action. Valid values are `skip`, `reject` and `placeDefault`. |
-| `create` | Only applies to managed queue parents. If set to "false", then the queue will not be created if it does not exist. |
+| `fallbackResult` | If the target queue does not exist or it cannot be created, it defines a fallback action. Valid values are `skip`, `reject` and `placeDefault`. |
+| `create` | If set to "false", then the queue will not be created if it does not exist. This flag works differently in flexible and in legacy mode (see below). |
 | `value` | If the policy is `setDefaultQueue`, then the default queue will change to this setting from "root.default". Otherwise ignored. |
 | `customPlacement` | Only works with `custom` placement policy. The value of this field will be evaluated directly by the engine, which means that various placeholders such as `%application` or `%primary_group` will be replaced with their respective values. |
 
 
   `type` is the equivalent of the first column in the old format. It is either "g" or "u" and there is a separate property for application mappings. `matches` is the second column. The only difference is that `%user` means to match all users, but it's not expressive enough. So in the new format, it's been changed to `*`.
   The `fallbackResult` setting is checked what to do when the target queue cannot be created or does not exist. The three settings work the following way:
+
 * `skip`: ignore the current rule and proceed to the next. This is how Fair Scheduler evaluates placement rules.
+
 * `placeDefault`: place the application to the default queue `root.default` (unless it's overridden to something else). This is how Capacity Scheduler works with the old mapping rules.
+
 * `reject`: rejects the submission.
 
-  The `create` flag has no effect on the queue if the parent is not managed.
+  The `create` flag is affected by the mode:
+
+* **Legacy** mode: it has no effect if the parent is not managed.

Review comment:
       Instead of managed we might need to use the configuration property key here eg. if yarn.scheduler.capacity.<queue-path>.auto-queue-creation.enabled is not set for parent.

##########
File path: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
##########
@@ -288,7 +288,27 @@ Below example covers single mapping separately. In case of multiple mappings wit
 
 Rules are evaluated from top to bottom. Compared to the legacy mapping rule evaluator, it can be adjusted more flexibly what happens when the evaluation stops and a given rule does not match.
 
-  * Rules
+####How to enable JSON-based queue mapping
+
+The following properties control how the new placement engine expects rules.
+
+| Setting | Description |
+|:---- |:---- |
+| `yarn.scheduler.capacity.mapping-rule-format` | Allowed values are `legacy` or `json`. If it is not set, then the engine assumes that the old format might be in use so it also checks the value of `yarn.scheduler.capacity.queue-mappings`. Therefore, this must be set to `json` and cannot be left empty. |
+| `yarn.scheduler.capacity.mapping-rule-json` | The value of this property should contain the entire chain of rules inline. This is the preferred way of configuring Capacity Scheduler if you use the Mutation API, ie. modify configuration real-time via the REST interface. |
+| `yarn.scheduler.capacity.mapping-rule-json-file` | Defines an absolute path to a JSON file which contains the rules. For exmaple, `/opt/hadoop/config/mapping-rules.json`. |
+
+The property `yarn.scheduler.capacity.mapping-rule-json` takes precedence over `yarn.scheduler.capacity.mapping-rule-json-file`. If the format is set to `json` but you don't define either of these, then you'll get a warning but the initialization of Capacity Scheduler will not fail.
+
+####Differences between legacy and flexible queue auto-creation modes
+
+To use the flexible Queue Auto-Creation under a parent the queue capacities must be configured with weights. The flexible mode gives the user much bigger freedom to automatically create new leaf queues or entire queue hierarchies based on mapping rules. "Legacy" mode refers to either percentage-based configuration or where capacities are defined with absolute resources.
+
+In flexible Queue Auto-Creation mode, the concept of `ManagedParent` does not exist, every parent queue can have dynamically created parent of leaf queues (if the `yarn.scheduler.capacity.<queue-path>.auto-queue-creation-v2.enabled` property is set to true). This also means that certain settings influence the outcome of the queue placement depending on how the scheduler is configured.

Review comment:
       I think the ManagedParent concept is an internal detail, maybe it should not be exposed to end users. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org