You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Grzegorz Huber <gr...@gmail.com> on 2016/09/02 09:53:47 UTC
SOLR replication: different behavior for network cut off vs. machine restart
Hi,
We try to set up a SOLR Cloud environment using 1 shard with 2
replicas (1 leader). The replicas are managed by 3 zookeeper
instances.
The setup seems fine when we do the normal work. The data is being
replicated at runtime.
Now we try to simulate erroneous behavior in several cases:
Turn off one of the replicas in two different scenarios: leader and non-leader
Cutting off the network making the non-leader replica down
In both cases the data is being written contentiously to the SOLR Cloud.
CASE 1: The replication process starts after the failed machine gets
boot up again. The complete data set is present in both replicas.
Everything works fine.
CASE 2: Once reconnected to network the non-leader replica starts the
recovery process ,but for some reason the new data from leader is not
being replicated onto the previously failed replica.
From what I was able to read from logs comparing both cases I don't
understand why SOLR sees
RecoveryStrategy ###### currentVersions as present and
RecoveryStrategy ###### startupVersions=[[]] (empty)
compared to CASE 1 when RecoveryStrategy ###### startupVersions are
filled with objects that are in currentVersions in CASE 2
The general question is... why restarting SOLR results in a successful
migration process, but reconnecting the network does not?
Thanks for any tips / leads!
Cheers,
Greg