You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Jamie Gruener <ja...@biospatial.io> on 2021/05/13 02:13:24 UTC

zkClient has disconnected event?

Folks,

We use Consul for service discovery, health checking, and load balancing. We have a 3 node Solrcloud 6.3.6 cluster with about a dozen single shard collections; most of those are well under < 10GB, but one is 120GB. We have a separate 3 node ZooKeeper ensemble.

We've been having some stability issues with OOM errors and occasional random recovery events. We're working through some options to resolve at least one of those root causes (out-of-control queries), but this event popped up today in the log at the same time that 4 health checks failed (we use the equivalent of a curl request to /admin/ping to determine availability of each collection on each node).

5/12/2021, 3:18:35 PM
WARN false
ConnectionManager
zkClient has disconnected

In researching `zkClient has disconnected` it seems that the recommendation is to simply increase the timeout. We had it set to the default of 15s; after updating our `solr.in.sh` file and restarting solr on each of the instances it's now at 30s.

Can anyone provide some insight on why Solr would timeout trying to reach ZooKeeper? And why is 15s too short? And could this cause the health check to fail?

If there's some specific documentation on how Solr interacts with ZooKeeper I'm happy to read it, I just can't seem to find it.

Many thanks,

--jamie