You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "kfaraz (via GitHub)" <gi...@apache.org> on 2023/03/30 11:22:16 UTC

[GitHub] [druid] kfaraz commented on a diff in pull request #13993: Web console: add a nice UI for overlord dynamic configs and improve the docs

kfaraz commented on code in PR #13993:
URL: https://github.com/apache/druid/pull/13993#discussion_r1152737075


##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.

Review Comment:
   ```suggestion
   Issuing a GET request to the same URL returns the current Overlord dynamic config.
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:

Review Comment:
   ```suggestion
   There are 4 options for select strategies:
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is `weak`
+- a preferred worker  (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available and affinity is `strong`
+
+Note that every worker listed in the `affinityConfig` will only be used for the assigned datasources and no other.
 
-Worker select strategies control how Druid assigns tasks to MiddleManagers.
+If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and `equalDistributionWithCategorySpec` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
 
-###### Equal Distribution
+- any worker if no categoryConfig is given for task type
+- any worker if categoryConfig is given for task type but no category is given for datasource and there's no default category
+- a preferred worker (based on categoryConfig and category for datasource) if available
+- any worker if categoryConfig and category are given but no preferred worker is available and categoryConfig is `weak`
+- not assigned at all if preferred workers are not available and `categoryConfig` is `strong`
 
-Tasks are assigned to the MiddleManager with the most free slots at the time the task begins running. This is useful if
-you want work evenly distributed across your MiddleManagers.
+In both cases, Druid constructs the eligible worker and selects one depending on their load with the goal of either distributing the load equally or filling as few workers as possible.

Review Comment:
   nit:
   ```suggestion
   In both the cases, Druid determines the list of eligible workers and selects one depending on their load with the goal of either distributing the load equally or filling as few workers as possible.
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -

Review Comment:
   ```suggestion
   To view the last `n` entries of the audit history of worker config, issue a GET request to the following URL:
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is `weak`
+- a preferred worker  (a worker listed in `affinityConfig`) if available

Review Comment:
   this item might be better placed before the previous item in the list
   
   ```suggestion
   - a preferred worker listed in the `affinityConfig` for this datasource if it has available capacity
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is `weak`
+- a preferred worker  (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available and affinity is `strong`

Review Comment:
   ```suggestion
   - no worker if preferred workers are not available and affinity is _strong_ i.e. `strong: true`. In this case, the task remains in "pending" state. The chosen provisioning strategy (e.g. `pendingTaskBased`) may then use the total number of pending tasks to determine if a new node should be provisioned.
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is `weak`
+- a preferred worker  (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available and affinity is `strong`
+
+Note that every worker listed in the `affinityConfig` will only be used for the assigned datasources and no other.
 
-Worker select strategies control how Druid assigns tasks to MiddleManagers.
+If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and `equalDistributionWithCategorySpec` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:

Review Comment:
   Similar suggestions here as with `affinityConfig`:
   
   ```suggestion
   If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and `equalDistributionWithCategorySpec` strategies), then a task of a given datasource may be assigned to:
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is `weak`
+- a preferred worker  (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available and affinity is `strong`
+
+Note that every worker listed in the `affinityConfig` will only be used for the assigned datasources and no other.
 
-Worker select strategies control how Druid assigns tasks to MiddleManagers.
+If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and `equalDistributionWithCategorySpec` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
 
-###### Equal Distribution
+- any worker if no categoryConfig is given for task type
+- any worker if categoryConfig is given for task type but no category is given for datasource and there's no default category
+- a preferred worker (based on categoryConfig and category for datasource) if available
+- any worker if categoryConfig and category are given but no preferred worker is available and categoryConfig is `weak`
+- not assigned at all if preferred workers are not available and `categoryConfig` is `strong`
 
-Tasks are assigned to the MiddleManager with the most free slots at the time the task begins running. This is useful if
-you want work evenly distributed across your MiddleManagers.
+In both cases, Druid constructs the eligible worker and selects one depending on their load with the goal of either distributing the load equally or filling as few workers as possible.
+
+If you are using auto-scaling, use the `fillCapacity` select strategy since auto-scaled nodes can
+not be assigned a category, and you want the work to be concentrated on the fewest number of workers to allow the empty ones to scale down.
+
+###### `equalDistribution`
+
+Tasks are assigned to the MiddleManager with the most free slots at the time the task begins running.
+This evenly distributes work across your MiddleManagers.
 
 |Property|Description|Default|
 |--------|-----------|-------|
-|`type`|`equalDistribution`.|required; must be `equalDistribution`|
-|`affinityConfig`|[Affinity config](#affinity) object|null (no affinity)|
+|`type`|`equalDistribution`|required; must be `equalDistribution`|
+|`affinityConfig`|[Affinity config](#affinityconfig) object|null (no affinity)|
 
-###### Equal Distribution With Category Spec
+###### `equalDistributionWithCategorySpec`
 
-This strategy is a variant of `Equal Distribution`, which support `workerCategorySpec` field rather than `affinityConfig`. By specifying `workerCategorySpec`, you can assign tasks to run on different categories of MiddleManagers based on the tasks' **taskType** and **dataSource name**. This strategy can't work with `AutoScaler` since the behavior is undefined.
+This strategy is a variant of `equalDistribution`, which supports `workerCategorySpec` field rather than `affinityConfig`.
+By specifying `workerCategorySpec`, you can assign tasks to run on different categories of MiddleManagers based on the task's **type** and **dataSource**.

Review Comment:
   nit:
   ```suggestion
   By specifying `workerCategorySpec`, you can assign tasks to run on different categories of MiddleManagers based on the **type** and **dataSource** of the task.
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using

Review Comment:
   ```suggestion
   At a high level, the select strategy determines the list of eligible workers for a given task using
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is `weak`
+- a preferred worker  (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available and affinity is `strong`
+
+Note that every worker listed in the `affinityConfig` will only be used for the assigned datasources and no other.
 
-Worker select strategies control how Druid assigns tasks to MiddleManagers.
+If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and `equalDistributionWithCategorySpec` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
 
-###### Equal Distribution
+- any worker if no categoryConfig is given for task type
+- any worker if categoryConfig is given for task type but no category is given for datasource and there's no default category
+- a preferred worker (based on categoryConfig and category for datasource) if available
+- any worker if categoryConfig and category are given but no preferred worker is available and categoryConfig is `weak`
+- not assigned at all if preferred workers are not available and `categoryConfig` is `strong`
 
-Tasks are assigned to the MiddleManager with the most free slots at the time the task begins running. This is useful if
-you want work evenly distributed across your MiddleManagers.
+In both cases, Druid constructs the eligible worker and selects one depending on their load with the goal of either distributing the load equally or filling as few workers as possible.
+
+If you are using auto-scaling, use the `fillCapacity` select strategy since auto-scaled nodes can

Review Comment:
   We should probably emphasize this point a little by putting it in a note or warn box or something.



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is `weak`

Review Comment:
   I think we should either italicize "weak" (since it is really code or config), or we can use `strong: false`.
   
   ```suggestion
   - a non-affinity worker if preferred workers are not available and the affinity is _weak_ i.e. `strong: false`.
   ```



##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
 }
 ```
 
-Issuing a GET request at the same URL will return the current worker config spec that is currently in place. The worker config spec list above is just a sample for EC2 and it is possible to extend the code base for other deployment environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic config spec.
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are `fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description                                                                                                                                                                                  | Default                       |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.                                                                                                                                              | null                          |
 
 To view the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
 ```
 
-default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
+The default value of `interval` can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord runtime.properties.
 
 To view last `n` entries of the audit history of worker config issue a GET request to the URL -
 
 ```
 http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
 ```
 
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers (MiddleManagers).
+At a high level, the select strategy determines the list of possible workers that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`).
+There are 4  options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows:

Review Comment:
   Slight rewording of the later part of this sentence and the list items for a little more readability:
   
   ```suggestion
   If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies), then a task of a given datasource may be assigned to:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org