You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Rohit Walecha <ro...@fnp.com> on 2023/01/18 09:13:13 UTC

Solr Restarting frequently.

Hi,

We have a 3 node *solr(8.8.0)* cluster deployed on multiple environments
which is connected to a 3 node *zookeeper(3.6.2)* cluster And, we have been
facing frequent restarts of solr cloud nodes since the last few
months..tried to debug this and while looking into the logs and other stats
we have been seeing that the node which has restarted says :

*1. *
2023-01-04 21:50:09.186 WARN (zkConnectionManagerCallback-15-thread-1) [ ]
o.a.s.c.c.ConnectionManager Watcher
org.apache.solr.common.cloud.ConnectionManager@731cf36d name:
ZooKeeperConnection
Watcher:apache-solrcloud-zookeeper-0.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-1.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-2.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181/
got event WatchedEvent state:Disconnected type:None path:null path: null
type: None
which probably says *event state is either disconnected or expired*, and
says following as a warning :
WARN (zkConnectionManagerCallback-13-thread-1) [ ]
o.a.s.c.c.ConnectionManager zkClient has disconnected



*2*.
Client session timed out, have not heard from server in 30018ms for
sessionid 0x1000091fcbe0001 A session timeout from ZkClient inside Solr.
*And 3.* 2023-01-04 21:50:10.685 INFO (ShutdownMonitor) [ ]
o.a.s.c.ZkController Publish this node as DOWN... 2023-01-04 21:50:10.685
INFO (ShutdownMonitor) [ ] o.a.s.c.ZkController Publish
node=apache-solrcloud-0.apache-solrcloud-headless.production:8983_solr as
DOWN
Attached *050120223-solr-cloud-0.log*



*Meanwhile zookeeper node says following the time at which solr node gets
restarted : *

2023-01-15 07:11:44,349 [myid:2] - WARN
[NIOWorkerThread-2:ZooKeeperServer@1384] - Connection request from old
client /10.70.26.0:54584; will be dropped if server is in r-o mode
2023-01-15 07:11:44,350 [myid:2] - INFO
[CommitProcessor:2:LearnerSessionTracker@116] - Committing global
session 0x200042f19cf130f
2023-01-15 07:11:44,352 [myid:2] - INFO
[RequestThrottler:QuorumZooKeeperServer@159] - Submitting global
closeSession request for session 0x200042f19cf130f


Now we are at a point where *we know that when the solr node is
getting restarted, who is is pushed down the node and as we can see in
the logs at [#2]* which says something like Client session timed out
and it is a session which is getting established between solr node and
zookeeper also  while debugging this issue we have went through a
series of issues reported in the current version of *zookeeper *we are
using which in gist says about slower leader election and zookeeper
nodes getting restarted and the whole zookeeper cluster going down
while a leader is getting unhealthy/stopped/restarted and leader
election happening again which is taking a long time which leads to
client sessions are getting timed out during that period of time.

We have tried to replicate the same on the local env by setting up a
solr and zookeeper cluster by forcefully restarting/stopping leader
zookeeper nodes and we have got something like :
*have-not-heard-back-local-cluster.log *and We could replicate [#2].

Seeking help here..to find out what could be the possible reason for
these frequent restarts of solr cloud nodes.
*Regards.
*