You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2019/08/22 13:36:13 UTC

8.2.0 After changing replica types, state.json is wrong and replication no longer takes place

Hello,

There is a newly created 8.2.0 all NRT type cluster for which i replaced each NRT replica with a TLOG type replica. Now, the replicas no longer replicate when the leader receives data. The situation is odd, because some shard replicas kept replicating up until eight hours ago, another one (same collection, same node) seven hours, and even another one four hours!

I inspected state.json to see what might be wrong, and compare it with another fully working, but much older, 8.2.0 all TLOG collection.

The faulty one still lists, probably from when it was created:
    "nrtReplicas":"2",
    "tlogReplicas":"0"
    "pullReplicas":"0",
    "replicationFactor":"2",

The working collection only has:
    "replicationFactor":"1",

What actually could cause this new collection to start replicating when i delete the data directory, but later on stop replicating at some random time, which is different for each shard.

Is there something i should change in state.json, and can it just be reuploaded to ZK?

Thanks,
Markus

Re: 8.2.0 After changing replica types, state.json is wrong and replication no longer takes place

Posted by Ere Maijala <er...@helsinki.fi>.
Hi,

We've had PULL replicas stop replicating a couple of times in Solr 7.x.
Restarting Solr has got it going again. No errors in logs, and I've been
unable to reproduce the issue at will. At least once it happened when I
reloaded a collection, but other times that hasn't caused any issues.

I'll make a note to check state.json next time we encounter the
situation to see if I can see what you reported.

Regards,
Ere

Markus Jelsma kirjoitti 22.8.2019 klo 16.36:
> Hello,
> 
> There is a newly created 8.2.0 all NRT type cluster for which i replaced each NRT replica with a TLOG type replica. Now, the replicas no longer replicate when the leader receives data. The situation is odd, because some shard replicas kept replicating up until eight hours ago, another one (same collection, same node) seven hours, and even another one four hours!
> 
> I inspected state.json to see what might be wrong, and compare it with another fully working, but much older, 8.2.0 all TLOG collection.
> 
> The faulty one still lists, probably from when it was created:
>     "nrtReplicas":"2",
>     "tlogReplicas":"0"
>     "pullReplicas":"0",
>     "replicationFactor":"2",
> 
> The working collection only has:
>     "replicationFactor":"1",
> 
> What actually could cause this new collection to start replicating when i delete the data directory, but later on stop replicating at some random time, which is different for each shard.
> 
> Is there something i should change in state.json, and can it just be reuploaded to ZK?
> 
> Thanks,
> Markus
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland