You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/02/03 02:56:43 UTC

[GitHub] gianm opened a new issue #6993: [Proposal] Dynamic prioritization and laning

gianm opened a new issue #6993: [Proposal] Dynamic prioritization and laning
URL: https://github.com/apache/incubator-druid/issues/6993

# Motivation

(Side note- at various points in this proposal I talk about 'historicals', but by that I mostly mean 'historicals and any other nodes that brokers can fan out to, including task peons'. It would have been a mouthful to type out every time.)

Clusters sometimes have heterogenous workloads: imagine both 'light' queries that could run at interactive speeds and 'heavy' resource-intensive queries. Busy clusters with heterogenous workloads can become unresponsive, even to 'light' queries, when cluster resources are all tied up processing 'heavy' queries. It's frustrating for end users who are accustomed to 'light' queries typically running quickly. Starvation can happen for multiple kinds of resources:

1. The `@Processing` thread pool on historical nodes (`druid.processing.numThreads`). Druid does make an effort to have each processing task be bite-sized (each one just processes one segment), so even though they are not preemptible, they are amenable to prioritization. Currently, query prioritization (setting `priority` in the query context) only does one thing: control the execution order of tasks in the processing pool. When priorities are set properly, this is reasonably effective at avoiding starvation.
2. The HTTP server thread pool on historicals and brokers (`druid.server.http.numThreads`). The model here is that each query gets one thread.
3. The merge buffer pool on historicals and brokers (`druid.processing.numMergeBuffers`). Each groupBy query needs one of these on historicals, and may need some on brokers as well.
4. The HTTP client connection limit from brokers to historicals (`druid.broker.http.numConnections`). Each connection can only carry one query at a time.

Items 2, 3, and 4 are allocated per-query, meaning that they effectively act as limits on the number of concurrent queries that can run. If each historical node has, let's say, 30 HTTP server threads, then no more than 30 queries can run concurrently at a time on that server. If all 30 of these queries are 'heavy' then they will starve out 'light' ones.

In practice, the ways that people prevent starvation of the resources of items 2, 3, and 4 are to either set the limits high enough that they exceed the number of 'heavy' queries that might run at once, or to try to separate 'heavy' and 'light' queries onto different nodes (by issuing heavy vs. light queries to different brokers, and using historical tiers to create a tier that only handles 'light' queries). Neither of these is really a very satisfying approach, since it means cluster architecture must be closely adapted to the expected query workload, creating more work for cluster operators and reducing stability when workloads change unexpectedly.

There is one more problem: even though query prioritization (1) is effective at preventing starvation of the `@Processing` pool, it can be difficult to set priorities appropriately. It's generally doable when end users are accessing Druid through an API layer that is under tight control. But when end users access Druid directly, or are accessing Druid through a third party UI like Superset or Looker, it is not likely that priorities will be properly set.

# Proposed changes

## Concept

The idea is to establish *dynamic prioritization* and *laning*.

Dynamic prioritization (adjusting query properties automatically) is meant to address the problem that end users are not always in a good position to be able to set priorities effectively. Laning, as in fast and slow lanes, is meant to cap the number of concurrently-running low-priority queries, and thereby reserve guaranteed resources for high-priority queries. This makes query prioritization effective at preventing starvation across the entire query stack, not just in the `@Processing` thread pool.

## New properties

## Altered query behavior

When a query comes in with priority less than zero, and the low-priority pool is full (there are maxLowPriorityThreads already running), reject the query with an HTTP 429 "Too Many Requests" code. Clients would be expected to do automatic backoff and retry. HTTP 503 "Service Unavailable" might be a more appropriate return code, but a less-common code like 429 makes it easier to know when to trigger the backoff/retry loop in Druid clients.

Don't ever reject queries with priority zero or higher.

This ensures that queries with non-negative priority cannot be starved out by queries with negative priority.

# Rationale

For dynamic prioritization, the goal in this proposal is to keep it as simple as possible while still being useful. I think basing it totally off the interval does that.

For laning, while arriving at the idea to reject low-priority queries, I thought of a few other options and decided against them:

- Queue up queries past maxLowPriorityThreads rather than rejecting them. I didn't propose this since Druid's HTTP server design is thread-per-request, so a server thread is allocated upfront, and the only way I could see to give up the thread is to abort the query. The HTTP server could potentially be refactored to make it possible to queue up queries without hogging threads, but I think that would make more sense as future work. Of course, at some point, queries would still need to be rejected, since you can't queue an infinite number of requests.
- Allow low-priority queries to run past maxLowPriorityThreads, but preempt them if higher-priority queries come in. This is how the `@Processing` thread pool works, but I don't think it'd work for the other levels of the query stack, since things like HTTP threads, merge buffers, and broker -> historical connections are allocated upfront and cannot be reclaimed without aborting the query.

# Operational impact

The new features would be optional and off by default, so there will be no operational impact to people that don't want to turn them on.

# Test plan

This sort of thing would need to be tested on real clusters to provide some assurance that it is effective. I have a few in mind that would be good candidates.

# Future work

- Improving the cost function that differentiates 'heavy' and 'light' queries, possibly by taking into account the number of rows, number of columns, expected filter selectivity, amount of computation estimated per row, and so on.

- Queue up low priority queries somehow, rather than rejecting them.

- Some way of setting the low-priority concurrency limit that is easier than `druid.server.http.maxLowPriorityThreads`. Setting a number of threads correctly requires cluster operators to think about many other different parameters on multiple node types.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org