You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/03/10 12:07:07 UTC

[GitHub] [druid] clintropolis opened a new pull request #9493: threshold based automatic query prioritization

clintropolis opened a new pull request #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493
 
 
   ### Description
   
   This PR is a follow-up to #9407 that adds a new interface `QueryPrioritizationStrategy` intended to enable implementations to automatically prioritize queries based on some criteria. As a proof of concept implementation of this functionality, it provides `ThresholdBasedQueryDeprioritizationStrategy`, which offers the 3 thresholds of: period from the current time of the query, duration of the interval of the query, and number of segments taking part in the query, described in #6993.
   
   This strategy can be enabled by setting `druid.query.scheduler.prioritization.strategy` to `threshold`.
   
   |Property|Description|Default|
   |--------|-----------|-------|
   |`druid.query.scheduler.prioritization.periodThreshold`|ISO duration threshold for how old data can be queried before automatically adjusting query priority.|None|
   |`druid.query.scheduler.prioritization.durationThreshold`|ISO duration threshold for maximum duration a queries interval can span before the priority is automatically adjusted.|None|
   |`druid.query.scheduler.prioritization.segmentCountThreshold`|Number threshold for maximum number of segments that can take part in a query before its priority is automatically adjusted.|None|
   |`druid.query.scheduler.prioritization.adjustment`|Amount to reduce the priority of queries which cross any threshold.|None|
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9493: threshold based automatic query prioritization

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493#discussion_r391351391
 
 

 ##########
 File path: docs/configuration/index.md
 ##########
 @@ -1473,22 +1473,41 @@ These Broker configurations can be defined in the `broker/runtime.properties` fi
 
 #### Query configuration
 
-##### Query prioritization
+##### Query routing
 
 |Property|Possible Values|Description|Default|
 |--------|---------------|-----------|-------|
 |`druid.broker.balancer.type`|`random`, `connectionCount`|Determines how the broker balances connections to Historical processes. `random` choose randomly, `connectionCount` picks the process with the fewest number of active connections to|`random`|
 |`druid.broker.select.tier`|`highestPriority`, `lowestPriority`, `custom`|If segments are cross-replicated across tiers in a cluster, you can tell the broker to prefer to select segments in a tier with a certain priority.|`highestPriority`|
 |`druid.broker.select.tier.custom.priorities`|`An array of integer priorities.`|Select servers in tiers with a custom priority list.|None|
 
-##### Query laning
+##### Query prioritization and laning
 
 *Laning strategies* allow you to control capacity utilization for heterogeneous query workloads. With laning, the broker examines and classifies a query for the purpose of assigning it to a 'lane'. Lanes have capacity limits, enforced by the broker, that can be used to ensure sufficient resources are available for other lanes or for interactive queries (with no lane), or to limit overall throughput for queries within the lane. Requests in excess of the capacity are discarded with an HTTP 429 status code.
 
 |Property|Description|Default|
 |--------|-----------|-------|
 |`druid.query.scheduler.numThreads`|Maximum number of HTTP threads to dedicate to query processing. To save HTTP thread capacity, this should be lower than `druid.server.http.numThreads`.|Unbounded|
 |`druid.query.scheduler.laning.strategy`|Query laning strategy to use to assign queries to a lane in order to control capacities for certain classes of queries.|`none`|
+|`druid.query.scheduler.prioritization.strategy`|Query prioritization strategy to automatically assign priorities.|`none`|
+
+##### Prioritization strategies
+
+###### No auto prioritization strategy
 
 Review comment:
   nit: hmm, is 'no auto' = 'manual'? Just wondering what is a better name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9493: threshold based automatic query prioritization

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493#discussion_r391928090
 
 

 ##########
 File path: docs/configuration/index.md
 ##########
 @@ -1473,22 +1473,41 @@ These Broker configurations can be defined in the `broker/runtime.properties` fi
 
 #### Query configuration
 
-##### Query prioritization
+##### Query routing
 
 |Property|Possible Values|Description|Default|
 |--------|---------------|-----------|-------|
 |`druid.broker.balancer.type`|`random`, `connectionCount`|Determines how the broker balances connections to Historical processes. `random` choose randomly, `connectionCount` picks the process with the fewest number of active connections to|`random`|
 |`druid.broker.select.tier`|`highestPriority`, `lowestPriority`, `custom`|If segments are cross-replicated across tiers in a cluster, you can tell the broker to prefer to select segments in a tier with a certain priority.|`highestPriority`|
 |`druid.broker.select.tier.custom.priorities`|`An array of integer priorities.`|Select servers in tiers with a custom priority list.|None|
 
-##### Query laning
+##### Query prioritization and laning
 
 *Laning strategies* allow you to control capacity utilization for heterogeneous query workloads. With laning, the broker examines and classifies a query for the purpose of assigning it to a 'lane'. Lanes have capacity limits, enforced by the broker, that can be used to ensure sufficient resources are available for other lanes or for interactive queries (with no lane), or to limit overall throughput for queries within the lane. Requests in excess of the capacity are discarded with an HTTP 429 status code.
 
 |Property|Description|Default|
 |--------|-----------|-------|
 |`druid.query.scheduler.numThreads`|Maximum number of HTTP threads to dedicate to query processing. To save HTTP thread capacity, this should be lower than `druid.server.http.numThreads`.|Unbounded|
 |`druid.query.scheduler.laning.strategy`|Query laning strategy to use to assign queries to a lane in order to control capacities for certain classes of queries.|`none`|
+|`druid.query.scheduler.prioritization.strategy`|Query prioritization strategy to automatically assign priorities.|`none`|
+
+##### Prioritization strategies
+
+###### No auto prioritization strategy
 
 Review comment:
   Hmm, after thinking about it, I think `manual` maybe makes the most sense, to be symmetrical with the `ManualQueryLaningStrategy` of #9492 because both rely on the user adding a value to the query context.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sascha-coenen edited a comment on issue #9493: threshold based automatic query prioritization

Posted by GitBox <gi...@apache.org>.
sascha-coenen edited a comment on issue #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493#issuecomment-597170972
 
 
   AWESOME! I love it. This will be so useful.
   
   If a query surpasses the thresholds several times, for instance it would be several times the segment threshold or the duration threshold would fit into the query time range several times, would the "adjustment" be decremented several times too?
   With the laning being implemented, will this become an alternative to the 'adjustment" property, to specify different lanes based on  the query weight?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on a change in pull request #9493: threshold based automatic query prioritization

Posted by GitBox <gi...@apache.org>.
himanshug commented on a change in pull request #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493#discussion_r391785447
 
 

 ##########
 File path: docs/configuration/index.md
 ##########
 @@ -1473,22 +1473,41 @@ These Broker configurations can be defined in the `broker/runtime.properties` fi
 
 #### Query configuration
 
-##### Query prioritization
+##### Query routing
 
 |Property|Possible Values|Description|Default|
 |--------|---------------|-----------|-------|
 |`druid.broker.balancer.type`|`random`, `connectionCount`|Determines how the broker balances connections to Historical processes. `random` choose randomly, `connectionCount` picks the process with the fewest number of active connections to|`random`|
 |`druid.broker.select.tier`|`highestPriority`, `lowestPriority`, `custom`|If segments are cross-replicated across tiers in a cluster, you can tell the broker to prefer to select segments in a tier with a certain priority.|`highestPriority`|
 |`druid.broker.select.tier.custom.priorities`|`An array of integer priorities.`|Select servers in tiers with a custom priority list.|None|
 
-##### Query laning
+##### Query prioritization and laning
 
 *Laning strategies* allow you to control capacity utilization for heterogeneous query workloads. With laning, the broker examines and classifies a query for the purpose of assigning it to a 'lane'. Lanes have capacity limits, enforced by the broker, that can be used to ensure sufficient resources are available for other lanes or for interactive queries (with no lane), or to limit overall throughput for queries within the lane. Requests in excess of the capacity are discarded with an HTTP 429 status code.
 
 |Property|Description|Default|
 |--------|-----------|-------|
 |`druid.query.scheduler.numThreads`|Maximum number of HTTP threads to dedicate to query processing. To save HTTP thread capacity, this should be lower than `druid.server.http.numThreads`.|Unbounded|
 |`druid.query.scheduler.laning.strategy`|Query laning strategy to use to assign queries to a lane in order to control capacities for certain classes of queries.|`none`|
+|`druid.query.scheduler.prioritization.strategy`|Query prioritization strategy to automatically assign priorities.|`none`|
+
+##### Prioritization strategies
+
+###### No auto prioritization strategy
 
 Review comment:
   "None prioritization strategy" ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on a change in pull request #9493: threshold based automatic query prioritization

Posted by GitBox <gi...@apache.org>.
himanshug commented on a change in pull request #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493#discussion_r391786815
 
 

 ##########
 File path: docs/configuration/index.md
 ##########
 @@ -1473,22 +1473,41 @@ These Broker configurations can be defined in the `broker/runtime.properties` fi
 
 #### Query configuration
 
-##### Query prioritization
+##### Query routing
 
 |Property|Possible Values|Description|Default|
 |--------|---------------|-----------|-------|
 |`druid.broker.balancer.type`|`random`, `connectionCount`|Determines how the broker balances connections to Historical processes. `random` choose randomly, `connectionCount` picks the process with the fewest number of active connections to|`random`|
 |`druid.broker.select.tier`|`highestPriority`, `lowestPriority`, `custom`|If segments are cross-replicated across tiers in a cluster, you can tell the broker to prefer to select segments in a tier with a certain priority.|`highestPriority`|
 |`druid.broker.select.tier.custom.priorities`|`An array of integer priorities.`|Select servers in tiers with a custom priority list.|None|
 
-##### Query laning
+##### Query prioritization and laning
 
 *Laning strategies* allow you to control capacity utilization for heterogeneous query workloads. With laning, the broker examines and classifies a query for the purpose of assigning it to a 'lane'. Lanes have capacity limits, enforced by the broker, that can be used to ensure sufficient resources are available for other lanes or for interactive queries (with no lane), or to limit overall throughput for queries within the lane. Requests in excess of the capacity are discarded with an HTTP 429 status code.
 
 |Property|Description|Default|
 |--------|-----------|-------|
 |`druid.query.scheduler.numThreads`|Maximum number of HTTP threads to dedicate to query processing. To save HTTP thread capacity, this should be lower than `druid.server.http.numThreads`.|Unbounded|
 |`druid.query.scheduler.laning.strategy`|Query laning strategy to use to assign queries to a lane in order to control capacities for certain classes of queries.|`none`|
+|`druid.query.scheduler.prioritization.strategy`|Query prioritization strategy to automatically assign priorities.|`none`|
+
+##### Prioritization strategies
+
+###### No auto prioritization strategy
+With this configuration, queries are never assigned a priority automatically, but will preserve a priority manually set on the [query context](../querying/query-context.md) with the `priority` key. This mode can be explicitly set by setting `druid.query.scheduler.prioritization.strategy` to `none`.
+
+###### Threshold deprioritization strategy
+
+This prioritization strategy deprioritizes queries that cross any of a configurable set of thresholds, such as how far in the past the data is, how large of an interval a query covers, or the number of segments taking part in a query.
 
 Review comment:
   nit: by rephrasing it to say, "... strategy lowers priority of queries that..." , we could make the heading be "Threshold prioritization strategy"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #9493: threshold based automatic query prioritization

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sascha-coenen commented on issue #9493: threshold based automatic query prioritization

Posted by GitBox <gi...@apache.org>.
sascha-coenen commented on issue #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493#issuecomment-597170972
 
 
   AWESOME! I love it. This will be so useful.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on issue #9493: threshold based automatic query prioritization

Posted by GitBox <gi...@apache.org>.
clintropolis commented on issue #9493: threshold based automatic query prioritization
URL: https://github.com/apache/druid/pull/9493#issuecomment-598611402
 
 
   thanks for the review @jihoonson and @himanshug 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org