You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/23 11:16:37 UTC

[GitHub] [incubator-druid] sascha-coenen edited a comment on issue #6993: [Proposal] Dynamic prioritization and laning

sascha-coenen edited a comment on issue #6993: [Proposal] Dynamic prioritization and laning
URL: https://github.com/apache/incubator-druid/issues/6993#issuecomment-534056000

> On resource pools,
> You're right about the name, it's not a good one based on how you've conceived it (it's not really a pool of resources, it's more like a way to group together queries that should be treated similarly).

True. How I got into referring to this feature as "resource pools" is that I was burrowing the name from Vertica. (https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/ResourceManager/ManagingWorkloads.htm?tocpath=Administrator%27s%20Guide%7CManaging%20the%20Database%7CManaging%20Workloads%7C_____0)

In Vertica one can define separate resource pools that have attributes like how much bandwidth they may use etc. A user account can be associated with a resource pool and resource pools can be set up in such a way that a query can start in one resource pool and can be moved to a different resource pool on certain conditions like if it hasn't completed within a given amount of time.
This goes along the same lines of what ccaominh has described above.

Recently, Amazon announced "Automatic workload management and query priorities " for their AWS Redshift product:
https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-redshift-announces-automatic-workload-management-and-query-priorities/
They even boast to use machine learning. I don't like the product but the naming "automatic workload management" is nice and it is also interesting that it has "automatic" in it, suggesting that resource management is actually something that doesn't require that much tuning and administration but something that kind of can be done almost in a mathematically optimal way.

I think the current Druid mechanism with query priorities is quite nice, but a priority that is 1 higher than another priority is already completely dominating the query prioritisation.
So whether using names or numbers to denote a priority, the prioritization should be gradual, not absolutist. Currently, as long as there are segments for a prio:10 query in the processing queue, no single segment for a prio:9 query will be processed. Instead I would think that a probabilistic interpretation of a numeric priority would be smoother. One could then put a higher-level construct on top of that. For instance a query analyzer which can compute how heavy a query is and can map that to a priority which makes sure that the query is neither allowed to starve nor too slow down interactive queries, simply by mapping it to the right numeric priority.
Such an approach would be nice in that it is such a generic concept that sophisticated features could be realised later on on that foundation. If on the other hand an "engineered" approach is taken as a foundation, like a fixed, predefined number of lanes with fixed, predefined behaviours, then this limits how many later innovations could be put on top of it.

>> More generally, this pair of issues and #4773 (to control which segments can and can not be in the file cache) should make separating Druid clusters into multiple tiers completely unnecessary.

Interesting! Is it really true that tiers could be made "completely" unnecessary? We are using tiers for fail safety, having one tier running in one availability zone and a tier holding replicas running in another availability zone. Would k-safety and multi-AZ cases also be possible in a world without tiers?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org