You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/07/25 02:28:46 UTC

[GitHub] drcrallen commented on issue #6014: Optionally refuse to consume new data until the prior chunk is being consumed

drcrallen commented on issue #6014: Optionally refuse to consume new data until the prior chunk is being consumed
URL: https://github.com/apache/incubator-druid/pull/6014#issuecomment-407613072

This ends up causing a lot of problems.

I wrestled with it for a long time but ended up unable to have a clean solution for the competing threads. I'm going to record what was found here but close down the PR:

There are two pools of `HttpClient`s in the broker. One for the broker connections and one for a global client pool. The broker `HttpClient` worker threads hold connections to the historicals and call the callbacks in the handlers. It is unclear if one worker keeps one handler active all the time, or if there are internal "pipelines" or "channels" which are maintained and assigned to worker threads as results come back, I couldn't debug to that level of detail. It has a fixed thread count maximum.

There is the qtp pool for the http server which services the logic for handling the query itself (the`Sequence` work). It has a fixed thread maximum.

With the parallel merge PR, there is also a `ForkJoinPool` for allowing forking and joining of merge work.

The challenge in managing these thread pools ended up hitting as follows:

Chunks would come back and be processed by broker `HttpClient` worker threads and attempt to call the callbacks of the `DirectDruidClient` handler. If the `DirectDruidClient` handler blocked, then the callbacks would not progress and the `HttpClient` thread would be stalled and risk pool starvation.

The jetty server http thread pool would be trying to feed off of a supplier of some kind so it could handle the `Sequence` materialization back up to the calling client.

The `ForkJoinPool` in the parallel merge PR would try to handle pushes from the `HttpClient`, do work, then have a way for the jetty server thread to feed the results out.

Trying to keep all the thread pools in lock-step to prevent thread starvation proved too much. I couldn't get tunings right or thread pool capacity predictions to a reasonable state. And the coordination started to look horrendous. For example, how can you make sure that HttpClient thread don't starve their connection limits to druid historicals for concurrent query loads? The extra frustrating part was that lots of unit tests and local integration tests worked great, but as soon as I would deploy the code to a real production system I would encounter locking problems and thread starvation instantly.

I think the "correct" solution is to move onto a framework that handles the scatter/gather nature of the druid query in a more start-to-finish kind of way rather than trying to piecemeal a bunch of disjoint components together.

The resource contention considerations are just really high without having a higher level cooperation among concurrent queries on how to handle thread pool resources.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org