You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/21 18:46:20 UTC

[GitHub] himanshug edited a comment on issue #6176: CuratorInventoryManager may not report inventory as initialized

himanshug edited a comment on issue #6176: CuratorInventoryManager may not report inventory as initialized
URL: https://github.com/apache/incubator-druid/issues/6176#issuecomment-414780324

@jihoonson
My test environment had 3 brokers, 2 coordinators, 2 overlords, ~40 Middle Managers (each running about 6 kafka indexing tasks created by kafka supervisor), about 15-20 Historicals.

some background and information is noted in https://groups.google.com/forum/#!msg/druid-development/eIWDPfhpM_U/AzMRxSQGAgAJ
but, there are 3 completely _independent_ things here...
1) switching coordinator to use HTTP (using `HttpLoadQueuePeon`) for segment assignment (load/drop)
2) switching broker/coordinator to use HTTP (using `HttpServerInventoryView`) for discovering what segments are served by queryable nodes (historicals, and peons doing indexing)
3) switching overlord to use HTTP for task mgmt (using `HttpRemoteTaskRunner`)

In my comment above I was talking about trying making (1) and (2) default after a bit of testing on some more clusters that you have.

looks like #6201 pertains to (3) , so let us not consider enabling (3) by default at this time until we get to the bottom of #6201 .

However, after (1), (2) and (3) are done with druid clusters using HTTP . And, we remove coordinator/overlord service announcement that is always done in ZK, to support tranquility.
Then , technically, it becomes possible to write extensions for discovery that don't necessarily use zookeeper and use say etcd instead. However, this is also an independent activity which will take its own time, so don't want to make it a prerequisite for trying out http or default to it as we gain more confidence with those features. And, remove zookeeper code in phases that is not needed (i.e. after say 4-6 months from a release where specific thing was made default)

each of (1), (2) lead to one additional connection per broker/coordinator to each queryable node.
(3) leads to one additional connection per overlord to each MiddleManager node.

On broker/coordinator/overlord side, `EscalatedGlobal httpClient` is used for making requests, so connections from their pools are used, new connection pools are created.

> One thing I'm concerned is the increasing HTTP connections.

theoretically, it should be OK and so far testing above , I haven't seen any connections issue popping up due to these features. but, concern is valid and we can be more confident only as we roll it on more clusters.

> On the other day, I could see Kafka indexing service was using too many HTTP connections compared to the number of worker threads even though the cluster was not using HTTP-based orverlords or coordinators. The number of HTTP connections was a few thousand which is not so high, but I'm not sure what is the proper default configuration for the number of worker threads.

I am assuming you meant overlord http client [worker threads] had thousands of outbound open connections.
for `EscalatedGlobal` client used by KIS as well, number of connections are set at https://github.com/apache/incubator-druid/blob/master/server/src/main/java/io/druid/guice/http/HttpClientModule.java#L140 (default value is 20 ).
So, at overlord, from that httpClient, maximum possible connections = 20 (or whatever is configured) X (number of KIS task peons, and any other processes that overlord could talk to using this client over HTTP)
from https://github.com/apache/incubator-druid/blob/master/server/src/main/java/io/druid/initialization/Initialization.java#L377 , I see there are at least 3 other HttpClient instances created with their own connection pools, so see if those are using the connections.
if above accounts for thousands of connections, then it is explained or else there is some bug in `HttpClient` code and it creates more connections than it is told to.
It would be good if you take a look at what host:port those connections are going to and see if those connections numbers make sense from the expectations above.

that said, features in (1), (2), (3) don't necessarily worsen the situation because we have far more http requests all around going on due to other features. I may be proven wrong in the end, but we wouldn't know till we try :) .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org