You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by HariBabu kuruva <ha...@gmail.com> on 2023/06/04 02:54:29 UTC

Solr Error| cluster state says we are the leader but locally we don't think so

Hi All,

As part of the O.S patching we have rebooted the servers and services in
PROD environment. After the activity we have started our services and we
see below errors in Solr.
Remote error message: ClusterState says we are the leader
(https://solrhostname.corp.equinix.com:port/solr/abcStore_shard1_replica_n1),
but locally we don't think so

Could you please help with this?

It's a 5 Zk's and 10 solr node cluster.

Thanks
Hari



-- 

Thanks and Regards,
 Hari
Mobile:9790756568

Re: Solr Error| cluster state says we are the leader but locally we don't think so

Posted by matthew sporleder <ms...@gmail.com>.

I have also gotten myself into situations where the leader election
looked broken and finding + restarting the overseer has always been
the best method to fix it.  You can find this stuff by browsing
zookeeper.


On Mon, Jun 5, 2023 at 9:13 AM Walter Underwood <wu...@wunderwood.org> wrote:
>
> I’ve seen this kind of thing happen when the overseer is stuck for some reason. Look for a long queue of work for the overseer in zookeeper. I’ve fixed that by restarting the node which is the overseer. The new one wakes up and clears the queue. I’ve only seen that twice.
>
> Wunder
>
> > On Jun 5, 2023, at 12:59 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> >
> > Hi,
> >
> > One possible reason for this could be that a shard leader experienced a high
> > load (or crash), causing its Zookeeper client timeout, e.g. losing its live_nodes entry.
> > That would cause a leader election, and a replica would become the new leader.
> > Once the original leader re-joins it will no longer be leader and go into recovery.
> >
> > Which version of Solr?
> > Look for additional logs for what might have happened, e.g. Timeout logs.
> >
> > Jan
> >
> >> 4. jun. 2023 kl. 04:54 skrev HariBabu kuruva <ha...@gmail.com>:
> >>
> >> Hi All,
> >>
> >> As part of the O.S patching we have rebooted the servers and services in
> >> PROD environment. After the activity we have started our services and we
> >> see below errors in Solr.
> >> Remote error message: ClusterState says we are the leader
> >> (https://solrhostname.corp.equinix.com:port/solr/abcStore_shard1_replica_n1),
> >> but locally we don't think so
> >>
> >> Could you please help with this?
> >>
> >> It's a 5 Zk's and 10 solr node cluster.
> >>
> >> Thanks
> >> Hari
> >>
> >>
> >>
> >> --
> >>
> >> Thanks and Regards,
> >> Hari
> >> Mobile:9790756568
> >
>

Re: Solr Error| cluster state says we are the leader but locally we don't think so

Posted by Walter Underwood <wu...@wunderwood.org>.

I’ve seen this kind of thing happen when the overseer is stuck for some reason. Look for a long queue of work for the overseer in zookeeper. I’ve fixed that by restarting the node which is the overseer. The new one wakes up and clears the queue. I’ve only seen that twice.

Wunder

> On Jun 5, 2023, at 12:59 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> 
> Hi,
> 
> One possible reason for this could be that a shard leader experienced a high
> load (or crash), causing its Zookeeper client timeout, e.g. losing its live_nodes entry.
> That would cause a leader election, and a replica would become the new leader.
> Once the original leader re-joins it will no longer be leader and go into recovery.
> 
> Which version of Solr?
> Look for additional logs for what might have happened, e.g. Timeout logs.
> 
> Jan
> 
>> 4. jun. 2023 kl. 04:54 skrev HariBabu kuruva <ha...@gmail.com>:
>> 
>> Hi All,
>> 
>> As part of the O.S patching we have rebooted the servers and services in
>> PROD environment. After the activity we have started our services and we
>> see below errors in Solr.
>> Remote error message: ClusterState says we are the leader
>> (https://solrhostname.corp.equinix.com:port/solr/abcStore_shard1_replica_n1),
>> but locally we don't think so
>> 
>> Could you please help with this?
>> 
>> It's a 5 Zk's and 10 solr node cluster.
>> 
>> Thanks
>> Hari
>> 
>> 
>> 
>> -- 
>> 
>> Thanks and Regards,
>> Hari
>> Mobile:9790756568
>

Re: Solr Error| cluster state says we are the leader but locally we don't think so

Posted by Jan Høydahl <ja...@cominvent.com>.

Hi,

One possible reason for this could be that a shard leader experienced a high
load (or crash), causing its Zookeeper client timeout, e.g. losing its live_nodes entry.
That would cause a leader election, and a replica would become the new leader.
Once the original leader re-joins it will no longer be leader and go into recovery.

Which version of Solr?
Look for additional logs for what might have happened, e.g. Timeout logs.

Jan

> 4. jun. 2023 kl. 04:54 skrev HariBabu kuruva <ha...@gmail.com>:
> 
> Hi All,
> 
> As part of the O.S patching we have rebooted the servers and services in
> PROD environment. After the activity we have started our services and we
> see below errors in Solr.
> Remote error message: ClusterState says we are the leader
> (https://solrhostname.corp.equinix.com:port/solr/abcStore_shard1_replica_n1),
> but locally we don't think so
> 
> Could you please help with this?
> 
> It's a 5 Zk's and 10 solr node cluster.
> 
> Thanks
> Hari
> 
> 
> 
> -- 
> 
> Thanks and Regards,
> Hari
> Mobile:9790756568