You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Rohit Walecha <ro...@fnp.com> on 2023/01/18 09:16:02 UTC

Solr Restarting frequently.

Hi,

We have a 3 node *solr(8.8.0)* cluster deployed on multiple environments
which is connected to a 3 node *zookeeper(3.6.2)* cluster And, we have been
facing frequent restarts of solr cloud nodes since the last few
months..tried to debug this and while looking into the logs and other stats
we have been seeing that the node which has restarted says :

*1. *
2023-01-04 21:50:09.186 WARN (zkConnectionManagerCallback-15-thread-1) [ ]
o.a.s.c.c.ConnectionManager Watcher
org.apache.solr.common.cloud.ConnectionManager@731cf36d name:
ZooKeeperConnection
Watcher:apache-solrcloud-zookeeper-0.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-1.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-2.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181/
got event WatchedEvent state:Disconnected type:None path:null path: null
type: None
which probably says *event state is either disconnected or expired*, and
says following as a warning :
WARN (zkConnectionManagerCallback-13-thread-1) [ ]
o.a.s.c.c.ConnectionManager zkClient has disconnected



*2*.
Client session timed out, have not heard from server in 30018ms for
sessionid 0x1000091fcbe0001 A session timeout from ZkClient inside Solr.
*And 3.* 2023-01-04 21:50:10.685 INFO (ShutdownMonitor) [ ]
o.a.s.c.ZkController Publish this node as DOWN... 2023-01-04 21:50:10.685
INFO (ShutdownMonitor) [ ] o.a.s.c.ZkController Publish
node=apache-solrcloud-0.apache-solrcloud-headless.production:8983_solr as
DOWN
Attached *050120223-solr-cloud-0.log*



*Meanwhile zookeeper node says following the time at which solr node gets
restarted : *

2023-01-15 07:11:44,349 [myid:2] - WARN
[NIOWorkerThread-2:ZooKeeperServer@1384] - Connection request from old
client /10.70.26.0:54584; will be dropped if server is in r-o mode
2023-01-15 07:11:44,350 [myid:2] - INFO
[CommitProcessor:2:LearnerSessionTracker@116] - Committing global
session 0x200042f19cf130f
2023-01-15 07:11:44,352 [myid:2] - INFO
[RequestThrottler:QuorumZooKeeperServer@159] - Submitting global
closeSession request for session 0x200042f19cf130f


Now we are at a point where *we know that when the solr node is
getting restarted, who is is pushed down the node and as we can see in
the logs at [#2]* which says something like Client session timed out
and it is a session which is getting established between solr node and
zookeeper also  while debugging this issue we have went through a
series of issues reported in the current version of *zookeeper *we are
using which in gist says about slower leader election and zookeeper
nodes getting restarted and the whole zookeeper cluster going down
while a leader is getting unhealthy/stopped/restarted and leader
election happening again which is taking a long time which leads to
client sessions are getting timed out during that period of time.

We have tried to replicate the same on the local env by setting up a
solr and zookeeper cluster by forcefully restarting/stopping leader
zookeeper nodes and we have got something like :
*have-not-heard-back-local-cluster.log *and We could replicate [#2].

Seeking help here..to find out what could be the possible reason for
these frequent restarts of solr cloud nodes.
*Regards.*

Re: Solr Restarting frequently.

Posted by Steph van Schalkwyk <sv...@gmail.com>.
What's your field count?

On Wed, Jan 18, 2023 at 10:20 AM Rohit Walecha <ro...@fnp.com> wrote:

> [image: Screenshot from 2023-01-18 19-06-33.png]
>
> Restart pattern is above..
>
> On Wed, Jan 18, 2023 at 2:46 PM Rohit Walecha <ro...@fnp.com> wrote:
>
>> Hi,
>>
>> We have a 3 node *solr(8.8.0)* cluster deployed on multiple environments
>> which is connected to a 3 node *zookeeper(3.6.2)* cluster And, we have
>> been facing frequent restarts of solr cloud nodes since the last few
>> months..tried to debug this and while looking into the logs and other stats
>> we have been seeing that the node which has restarted says :
>>
>> *1. *
>> 2023-01-04 21:50:09.186 WARN (zkConnectionManagerCallback-15-thread-1) [
>> ] o.a.s.c.c.ConnectionManager Watcher
>> org.apache.solr.common.cloud.ConnectionManager@731cf36d name:
>> ZooKeeperConnection
>> Watcher:apache-solrcloud-zookeeper-0.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-1.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-2.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181/
>> got event WatchedEvent state:Disconnected type:None path:null path: null
>> type: None
>> which probably says *event state is either disconnected or expired*, and
>> says following as a warning :
>> WARN (zkConnectionManagerCallback-13-thread-1) [ ]
>> o.a.s.c.c.ConnectionManager zkClient has disconnected
>>
>>
>>
>> *2*.
>> Client session timed out, have not heard from server in 30018ms for
>> sessionid 0x1000091fcbe0001 A session timeout from ZkClient inside Solr.
>> *And 3.* 2023-01-04 21:50:10.685 INFO (ShutdownMonitor) [ ]
>> o.a.s.c.ZkController Publish this node as DOWN... 2023-01-04
>> 21:50:10.685 INFO (ShutdownMonitor) [ ] o.a.s.c.ZkController Publish
>> node=apache-solrcloud-0.apache-solrcloud-headless.production:8983_solr as
>> DOWN
>> Attached *050120223-solr-cloud-0.log*
>>
>>
>>
>> *Meanwhile zookeeper node says following the time at which solr node gets
>> restarted : *
>>
>> 2023-01-15 07:11:44,349 [myid:2] - WARN  [NIOWorkerThread-2:ZooKeeperServer@1384] - Connection request from old client /10.70.26.0:54584; will be dropped if server is in r-o mode
>> 2023-01-15 07:11:44,350 [myid:2] - INFO  [CommitProcessor:2:LearnerSessionTracker@116] - Committing global session 0x200042f19cf130f
>> 2023-01-15 07:11:44,352 [myid:2] - INFO  [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x200042f19cf130f
>>
>>
>> Now we are at a point where *we know that when the solr node is getting restarted, who is is pushed down the node and as we can see in the logs at [#2]* which says something like Client session timed out and it is a session which is getting established between solr node and zookeeper also  while debugging this issue we have went through a series of issues reported in the current version of *zookeeper *we are using which in gist says about slower leader election and zookeeper nodes getting restarted and the whole zookeeper cluster going down while a leader is getting unhealthy/stopped/restarted and leader election happening again which is taking a long time which leads to client sessions are getting timed out during that period of time.
>>
>> We have tried to replicate the same on the local env by setting up a solr and zookeeper cluster by forcefully restarting/stopping leader zookeeper nodes and we have got something like : *have-not-heard-back-local-cluster.log *and We could replicate [#2].
>>
>> Seeking help here..to find out what could be the possible reason for these frequent restarts of solr cloud nodes.
>> *Regards.*
>>
>>

Re: Solr Restarting frequently.

Posted by Rohit Walecha <ro...@fnp.com>.
+ @Uday Kumar Elluri <ud...@fnp.com>

On Mon, Jan 23, 2023 at 12:50 PM Rohit Walecha <ro...@fnp.com> wrote:

> What field count are we talking about here ?
>
> On Wed, Jan 18, 2023 at 7:07 PM Rohit Walecha <ro...@fnp.com> wrote:
>
>> [image: Screenshot from 2023-01-18 19-06-33.png]
>>
>> Restart pattern is above..
>>
>> On Wed, Jan 18, 2023 at 2:46 PM Rohit Walecha <ro...@fnp.com> wrote:
>>
>>> Hi,
>>>
>>> We have a 3 node *solr(8.8.0)* cluster deployed on multiple
>>> environments which is connected to a 3 node *zookeeper(3.6.2)* cluster
>>> And, we have been facing frequent restarts of solr cloud nodes since the
>>> last few months..tried to debug this and while looking into the logs and
>>> other stats we have been seeing that the node which has restarted says :
>>>
>>> *1. *
>>> 2023-01-04 21:50:09.186 WARN (zkConnectionManagerCallback-15-thread-1) [
>>> ] o.a.s.c.c.ConnectionManager Watcher
>>> org.apache.solr.common.cloud.ConnectionManager@731cf36d name:
>>> ZooKeeperConnection
>>> Watcher:apache-solrcloud-zookeeper-0.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-1.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-2.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181/
>>> got event WatchedEvent state:Disconnected type:None path:null path: null
>>> type: None
>>> which probably says *event state is either disconnected or expired*,
>>> and says following as a warning :
>>> WARN (zkConnectionManagerCallback-13-thread-1) [ ]
>>> o.a.s.c.c.ConnectionManager zkClient has disconnected
>>>
>>>
>>>
>>> *2*.
>>> Client session timed out, have not heard from server in 30018ms for
>>> sessionid 0x1000091fcbe0001 A session timeout from ZkClient inside Solr.
>>> *And 3.* 2023-01-04 21:50:10.685 INFO (ShutdownMonitor) [ ]
>>> o.a.s.c.ZkController Publish this node as DOWN... 2023-01-04
>>> 21:50:10.685 INFO (ShutdownMonitor) [ ] o.a.s.c.ZkController Publish
>>> node=apache-solrcloud-0.apache-solrcloud-headless.production:8983_solr as
>>> DOWN
>>> Attached *050120223-solr-cloud-0.log*
>>>
>>>
>>>
>>> *Meanwhile zookeeper node says following the time at which solr node
>>> gets restarted : *
>>>
>>> 2023-01-15 07:11:44,349 [myid:2] - WARN  [NIOWorkerThread-2:ZooKeeperServer@1384] - Connection request from old client /10.70.26.0:54584; will be dropped if server is in r-o mode
>>> 2023-01-15 07:11:44,350 [myid:2] - INFO  [CommitProcessor:2:LearnerSessionTracker@116] - Committing global session 0x200042f19cf130f
>>> 2023-01-15 07:11:44,352 [myid:2] - INFO  [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x200042f19cf130f
>>>
>>>
>>> Now we are at a point where *we know that when the solr node is getting restarted, who is is pushed down the node and as we can see in the logs at [#2]* which says something like Client session timed out and it is a session which is getting established between solr node and zookeeper also  while debugging this issue we have went through a series of issues reported in the current version of *zookeeper *we are using which in gist says about slower leader election and zookeeper nodes getting restarted and the whole zookeeper cluster going down while a leader is getting unhealthy/stopped/restarted and leader election happening again which is taking a long time which leads to client sessions are getting timed out during that period of time.
>>>
>>> We have tried to replicate the same on the local env by setting up a solr and zookeeper cluster by forcefully restarting/stopping leader zookeeper nodes and we have got something like : *have-not-heard-back-local-cluster.log *and We could replicate [#2].
>>>
>>> Seeking help here..to find out what could be the possible reason for these frequent restarts of solr cloud nodes.
>>> *Regards.*
>>>
>>>

Re: Solr Restarting frequently.

Posted by Rohit Walecha <ro...@fnp.com>.
What field count are we talking about here ?

On Wed, Jan 18, 2023 at 7:07 PM Rohit Walecha <ro...@fnp.com> wrote:

> [image: Screenshot from 2023-01-18 19-06-33.png]
>
> Restart pattern is above..
>
> On Wed, Jan 18, 2023 at 2:46 PM Rohit Walecha <ro...@fnp.com> wrote:
>
>> Hi,
>>
>> We have a 3 node *solr(8.8.0)* cluster deployed on multiple environments
>> which is connected to a 3 node *zookeeper(3.6.2)* cluster And, we have
>> been facing frequent restarts of solr cloud nodes since the last few
>> months..tried to debug this and while looking into the logs and other stats
>> we have been seeing that the node which has restarted says :
>>
>> *1. *
>> 2023-01-04 21:50:09.186 WARN (zkConnectionManagerCallback-15-thread-1) [
>> ] o.a.s.c.c.ConnectionManager Watcher
>> org.apache.solr.common.cloud.ConnectionManager@731cf36d name:
>> ZooKeeperConnection
>> Watcher:apache-solrcloud-zookeeper-0.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-1.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-2.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181/
>> got event WatchedEvent state:Disconnected type:None path:null path: null
>> type: None
>> which probably says *event state is either disconnected or expired*, and
>> says following as a warning :
>> WARN (zkConnectionManagerCallback-13-thread-1) [ ]
>> o.a.s.c.c.ConnectionManager zkClient has disconnected
>>
>>
>>
>> *2*.
>> Client session timed out, have not heard from server in 30018ms for
>> sessionid 0x1000091fcbe0001 A session timeout from ZkClient inside Solr.
>> *And 3.* 2023-01-04 21:50:10.685 INFO (ShutdownMonitor) [ ]
>> o.a.s.c.ZkController Publish this node as DOWN... 2023-01-04
>> 21:50:10.685 INFO (ShutdownMonitor) [ ] o.a.s.c.ZkController Publish
>> node=apache-solrcloud-0.apache-solrcloud-headless.production:8983_solr as
>> DOWN
>> Attached *050120223-solr-cloud-0.log*
>>
>>
>>
>> *Meanwhile zookeeper node says following the time at which solr node gets
>> restarted : *
>>
>> 2023-01-15 07:11:44,349 [myid:2] - WARN  [NIOWorkerThread-2:ZooKeeperServer@1384] - Connection request from old client /10.70.26.0:54584; will be dropped if server is in r-o mode
>> 2023-01-15 07:11:44,350 [myid:2] - INFO  [CommitProcessor:2:LearnerSessionTracker@116] - Committing global session 0x200042f19cf130f
>> 2023-01-15 07:11:44,352 [myid:2] - INFO  [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x200042f19cf130f
>>
>>
>> Now we are at a point where *we know that when the solr node is getting restarted, who is is pushed down the node and as we can see in the logs at [#2]* which says something like Client session timed out and it is a session which is getting established between solr node and zookeeper also  while debugging this issue we have went through a series of issues reported in the current version of *zookeeper *we are using which in gist says about slower leader election and zookeeper nodes getting restarted and the whole zookeeper cluster going down while a leader is getting unhealthy/stopped/restarted and leader election happening again which is taking a long time which leads to client sessions are getting timed out during that period of time.
>>
>> We have tried to replicate the same on the local env by setting up a solr and zookeeper cluster by forcefully restarting/stopping leader zookeeper nodes and we have got something like : *have-not-heard-back-local-cluster.log *and We could replicate [#2].
>>
>> Seeking help here..to find out what could be the possible reason for these frequent restarts of solr cloud nodes.
>> *Regards.*
>>
>>

Re: Solr Restarting frequently.

Posted by Rohit Walecha <ro...@fnp.com>.
[image: Screenshot from 2023-01-18 19-06-33.png]

Restart pattern is above..

On Wed, Jan 18, 2023 at 2:46 PM Rohit Walecha <ro...@fnp.com> wrote:

> Hi,
>
> We have a 3 node *solr(8.8.0)* cluster deployed on multiple environments
> which is connected to a 3 node *zookeeper(3.6.2)* cluster And, we have
> been facing frequent restarts of solr cloud nodes since the last few
> months..tried to debug this and while looking into the logs and other stats
> we have been seeing that the node which has restarted says :
>
> *1. *
> 2023-01-04 21:50:09.186 WARN (zkConnectionManagerCallback-15-thread-1) [ ]
> o.a.s.c.c.ConnectionManager Watcher
> org.apache.solr.common.cloud.ConnectionManager@731cf36d name:
> ZooKeeperConnection
> Watcher:apache-solrcloud-zookeeper-0.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-1.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-2.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181/
> got event WatchedEvent state:Disconnected type:None path:null path: null
> type: None
> which probably says *event state is either disconnected or expired*, and
> says following as a warning :
> WARN (zkConnectionManagerCallback-13-thread-1) [ ]
> o.a.s.c.c.ConnectionManager zkClient has disconnected
>
>
>
> *2*.
> Client session timed out, have not heard from server in 30018ms for
> sessionid 0x1000091fcbe0001 A session timeout from ZkClient inside Solr.
> *And 3.* 2023-01-04 21:50:10.685 INFO (ShutdownMonitor) [ ]
> o.a.s.c.ZkController Publish this node as DOWN... 2023-01-04 21:50:10.685
> INFO (ShutdownMonitor) [ ] o.a.s.c.ZkController Publish
> node=apache-solrcloud-0.apache-solrcloud-headless.production:8983_solr as
> DOWN
> Attached *050120223-solr-cloud-0.log*
>
>
>
> *Meanwhile zookeeper node says following the time at which solr node gets
> restarted : *
>
> 2023-01-15 07:11:44,349 [myid:2] - WARN  [NIOWorkerThread-2:ZooKeeperServer@1384] - Connection request from old client /10.70.26.0:54584; will be dropped if server is in r-o mode
> 2023-01-15 07:11:44,350 [myid:2] - INFO  [CommitProcessor:2:LearnerSessionTracker@116] - Committing global session 0x200042f19cf130f
> 2023-01-15 07:11:44,352 [myid:2] - INFO  [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x200042f19cf130f
>
>
> Now we are at a point where *we know that when the solr node is getting restarted, who is is pushed down the node and as we can see in the logs at [#2]* which says something like Client session timed out and it is a session which is getting established between solr node and zookeeper also  while debugging this issue we have went through a series of issues reported in the current version of *zookeeper *we are using which in gist says about slower leader election and zookeeper nodes getting restarted and the whole zookeeper cluster going down while a leader is getting unhealthy/stopped/restarted and leader election happening again which is taking a long time which leads to client sessions are getting timed out during that period of time.
>
> We have tried to replicate the same on the local env by setting up a solr and zookeeper cluster by forcefully restarting/stopping leader zookeeper nodes and we have got something like : *have-not-heard-back-local-cluster.log *and We could replicate [#2].
>
> Seeking help here..to find out what could be the possible reason for these frequent restarts of solr cloud nodes.
> *Regards.*
>
>