You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by philippa griggs <ph...@hotmail.co.uk> on 2017/02/15 11:06:40 UTC

Core replication, Slave not flipping to master

Hello,



Solr 5.4.1, multiple cores with two cores per shard. Zookeeper 3.4.6   (5 zookeeper ensemble).


I have noticed an error with the replication between two cores in a shard. I’m having to perform a schema update which means I have to stop and start the cores.  I’m trying to do this in a way so I don’t get any down time. Restarting one core in the shard, waiting for that to come back up before restarting the second one.


However when restarting the master, the slave isn’t flipping and becoming the master itself.  Instead I’m getting errors in the log as follows:


Exception while invoking 'details' method for replication on master -Server refused connection at xxx


When I run


http://xxx:8983/solr/core_name/replication?command=details<http://ec2-52-5-236-168.compute-1.amazonaws.com:8987/solr/sessionfilterset/replication?command=details>


Is see


<lst name="slave">

<str name="ERROR">invalid_master</str>

<str name="masterUrl">http://xxx:8983/solr/core_name/</str>

<str name="currentDate">Wed Feb 15 10:44:30 UTC 2017</str>

<str name="isPollingDisabled">false</str>

<str name="isReplicating">false</str>

</lst>

</lst>


Once the old master comes back up again, it comes in as a slave, which is what I would expect. However as the other core hasn’t flipped into becoming the master, I am left with both cores thinking they are slaves.


I would expect when the master goes down and is unreachable, the slave would flip and not just throw an error about the connection.  Does anyone have any ideas on why this is happening and could point me in the direction of what to do to fix this issue?


Many thanks

Philippa

Re: Core replication, Slave not flipping to master

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/15/2017 4:06 AM, philippa griggs wrote:
> Solr 5.4.1, multiple cores with two cores per shard. Zookeeper 3.4.6   (5 zookeeper ensemble).
>
> I have noticed an error with the replication between two cores in a shard. I\u2019m having to perform a schema update which means I have to stop and start the cores.  I\u2019m trying to do this in a way so I don\u2019t get any down time. Restarting one core in the shard, waiting for that to come back up before restarting the second one.

You're talking about zookeeper, which means SolrCloud.  Just reload the
collection after you change the config/schema in zookeeper.  Solr will
handle reloading all shards and all replicas, no matter how many actual
servers are involved.  There will be no downtime.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RELOAD:ReloadaCollection

> However when restarting the master, the slave isn\u2019t flipping and becoming the master itself.  Instead I\u2019m getting errors in the log as follows:

We would need to see the *full* error in your logs.  It looks like
you've just pulled part of it out and not included the entire message,
which might be dozens of lines.

There are no masters and no slaves in normal SolrCloud operation.  One
of the replicas of each shard gets elected as leader.  The replication
feature is **NOT** used unless something goes wrong.  If a replica
requires complete replacement, then SolrCloud will use the old
master/slave replication feature to copy the leader's index to the bad
replica.

> When I run
>
> http://xxx:8983/solr/core_name/replication?command=details<http://ec2-52-5-236-168.compute-1.amazonaws.com:8987/solr/sessionfilterset/replication?command=details>

If you're running SolrCloud (Solr plus zookeeper), why are you doing
anything at all with the replication handler?  As I said above,
SolrCloud only uses the replication feature in emergencies.  It doesn't
touch the replication handler's config as master or slave until the
precise moment that an index replication is actually needed.  A core's
status as a replication master or slave is meaningless to normal
SolrCloud operation.

Thanks,
Shawn