You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Lewin Joy (TMS)" <le...@toyota.com> on 2016/11/22 02:52:51 UTC

Frequent mismatch in the numDocs between replicas

** PROTECTED 関係者外秘
Hi,

I am having a strange issue working with solr 6.1 cloud setup on zookeeper 3.4.8

Intermittently after I run Indexing, the replicas are having a different record count.
And even though there is this mismatch, it is still marked healthy and is being used for queries.
So, now I get inconsistent results based on the replica used for the query.

This gets resolved after restarting solr servers. Or if I just do an optimize on the collection.

Any idea what could be wrong? Have any of you faced something similar?
Is there some configuration or setting I should be checking?


Thanks,
Lewin

RE: Frequent mismatch in the numDocs between replicas

Posted by "Lewin Joy (TMS)" <le...@toyota.com>.
ll PROTECTED 関係者外秘

Hi,

Tried this. The explicit commit after Indexing is also not working.
As for the leader's document count, the number of records in the leader is also not proper. 
It is not just the replicas having wrong numbers.
Both the leader and replica are having wrong counts. And it is also mismatched between replicas.
Sometimes, Indexing does not reflect the data unless solr is restarted.

Usually after an optimize OR restart, we see that the counts in the leader and replicas match. 
And also the counts increase on both leaders and replicas. 
It is as if the inserted docs are not getting reflected anywhere. Not even on leaders.

It may have something to do with our code. Because curl Indexing / DIH Indexing returns proper count after Indexing.
Did something change in the way we are indexing in Solr 5.4 vs Solr 6, which could be causing the issue?

-Lewin

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, November 22, 2016 8:40 AM
To: solr-user <so...@lucene.apache.org>
Subject: Re: Frequent mismatch in the numDocs between replicas

The autocommit settings on leaders and replicas
can be slightly offset in terms of wall clock time so
docs that have been committed on one node may
not have been committed on the other. Your comment
that you can optimize and fix this is evidence that this
is what you're seeing.

to test this:
1> stop indexing
2> issue a "commit" to the collection.

If that shows all replicas with the same count, then
the above is the explanation.

Best,
Erick

On Mon, Nov 21, 2016 at 6:52 PM, Lewin Joy (TMS) <le...@toyota.com> wrote:
> ** PROTECTED 関係者外秘
> Hi,
>
> I am having a strange issue working with solr 6.1 cloud setup on zookeeper 3.4.8
>
> Intermittently after I run Indexing, the replicas are having a different record count.
> And even though there is this mismatch, it is still marked healthy and is being used for queries.
> So, now I get inconsistent results based on the replica used for the query.
>
> This gets resolved after restarting solr servers. Or if I just do an optimize on the collection.
>
> Any idea what could be wrong? Have any of you faced something similar?
> Is there some configuration or setting I should be checking?
>
>
> Thanks,
> Lewin

Re: Frequent mismatch in the numDocs between replicas

Posted by Erick Erickson <er...@gmail.com>.
The autocommit settings on leaders and replicas
can be slightly offset in terms of wall clock time so
docs that have been committed on one node may
not have been committed on the other. Your comment
that you can optimize and fix this is evidence that this
is what you're seeing.

to test this:
1> stop indexing
2> issue a "commit" to the collection.

If that shows all replicas with the same count, then
the above is the explanation.

Best,
Erick

On Mon, Nov 21, 2016 at 6:52 PM, Lewin Joy (TMS) <le...@toyota.com> wrote:
> ** PROTECTED 関係者外秘
> Hi,
>
> I am having a strange issue working with solr 6.1 cloud setup on zookeeper 3.4.8
>
> Intermittently after I run Indexing, the replicas are having a different record count.
> And even though there is this mismatch, it is still marked healthy and is being used for queries.
> So, now I get inconsistent results based on the replica used for the query.
>
> This gets resolved after restarting solr servers. Or if I just do an optimize on the collection.
>
> Any idea what could be wrong? Have any of you faced something similar?
> Is there some configuration or setting I should be checking?
>
>
> Thanks,
> Lewin