You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Davide Isoardi <is...@ecubecenter.it> on 2016/10/14 09:35:42 UTC

Incongruent results of numdocs

Hi all,
I have indexed more than 1 million of docs on a SolrCloud collections whit 5 shards and 2 replicas.

After the indexing if I try to query (many times) q=id:*&rows=0 I have different result for the document number founds.

Why the result is not the same for all querys?

Thanks in advance
Davide Isoardi
eCube S.r.l.
isoardi@ecubecenter.it<ma...@ecubecenter.it>
http://www.ecubecenter.it<http://www.ecubecenter.it/>
Tel.  +390113999301
Mobile +393288204915
Fax. +390113999309

 [ecube] <http://www.ecubecenter.it/> [ecube-firma-mail] <http://www.ecubecenter.it/> [TW1] <https://twitter.com/eCube_SRL>   [IN1] <http://it.linkedin.com/company/ecube>
Informativa ai sensi del Decr.Lgs Privacy n.196/2003
ECUBE tratta i dati personali secondo quanto specificato nella pagina "Privacy Policy" disponibile su http://www.ecubecenter.it/privacy.pdf. Le informazioni contenute nel presente messaggio sono destinate esclusivamente al/ai destinatario/i indicato/i. Qualora riceviate il presente messaggio per errore, vi preghiamo di voler cortesemente darcene notizia via e-mail (info@ecubecenter.it<ma...@ecubecenter.it>) e di provvedere ad eliminare il messaggio ricevuto erroneamente, essendo illegittimo ed illecito ogni diverso utilizzo.



R: R: R: Incongruent results of numdocs

Posted by Davide Isoardi <is...@ecubecenter.it>.
I find nothing in the log concerning this issue.

I try to do your suggestion.

Thanks
Davide Isoardi
eCube S.r.l.
isoardi@ecubecenter.it
http://www.ecubecenter.it
Tel.  +390113999301
Mobile +393288204915
Fax. +390113999309

   
Informativa ai sensi del Decr.Lgs Privacy n.196/2003
ECUBE tratta i dati personali secondo quanto specificato nella pagina “Privacy Policy” disponibile su http://www.ecubecenter.it/privacy.pdf. Le informazioni contenute nel presente messaggio sono destinate esclusivamente al/ai destinatario/i indicato/i. Qualora riceviate il presente messaggio per errore, vi preghiamo di voler cortesemente darcene notizia via e-mail (info@ecubecenter.it) e di provvedere ad eliminare il messaggio ricevuto erroneamente, essendo illegittimo ed illecito ogni diverso utilizzo.


-----Messaggio originale-----
Da: Erick Erickson [mailto:erickerickson@gmail.com] 
Inviato: sabato 15 ottobre 2016 00:45
A: solr-user <so...@lucene.apache.org>
Oggetto: Re: R: R: Incongruent results of numdocs

Not quite. When you add replica 3, it will be synchronized to the leader.
So I'd shut down the solr node with the bad replica, add the new replica and then delete the old one.

The disturbing bit is that the replicas for out of sync in the first place.
Anything in the logs that gives any clues as to why?

Best,
Erick

On Oct 14, 2016 18:32, "Davide Isoardi" <is...@ecubecenter.it> wrote:

> I am sorry for my typos. I have compared numdocs of  shard1_replica1 
> with shard1_replica2.
>
> If I  create another replica (replica3) and only after that I unload 
> replica2, will the last replica be synchronized with replica1?
>
> Inviata dal mio Windows Phone
> ________________________________
> Da: Shawn Heisey<ma...@elyograg.org>
> Inviato: ‎14/‎10/‎2016 18:33
> A: solr-user@lucene.apache.org<ma...@lucene.apache.org>
> Oggetto: Re: R: Incongruent results of numdocs
>
> On 10/14/2016 9:43 AM, Davide Isoardi wrote:
> >
> > thank you very much for the quick answare.
> >
> >
> >
> > Yes, I am not indexing between request.
> >
> >
> >
> > How can I risync two or all replicas?
> >
> > If I look the overviews in the shard menu (attached the screenshot) 
> > I see that the num docs are mismatched.
> >
> >
> >
> > Shard1_replica1
> >
> >
> > Shard2_replica2
> >
>
> I can't see those pictures, the attachments didn't make it.  You seem 
> to be comparing shard1 and shard2.  That's not a valid comparison.  
> There's a very good chance that different shards will have different 
> document counts even if everything is working correctly.  You need to 
> compare replicas of shard1 to other replicas of shard1, shard2 to shard2, etc.
> They'll likely be on different servers.
>
> Probably the best way to force a resync is to shutdown a Solr 
> instance, decide which replicas you want to delete on that instance, 
> delete the data directory for those replicas, and start Solr back up.  
> Any replica where you delete the data directory will copy the index 
> from the shard leader, and they'll be back in sync when the copy 
> finishes.  Before you do this, make sure that you actually do have 
> multiple replicas of each shard.
>
> Thanks,
> Shawn
>
>

Re: R: R: Incongruent results of numdocs

Posted by Erick Erickson <er...@gmail.com>.
Not quite. When you add replica 3, it will be synchronized to the leader.
So I'd shut down the solr node with the bad replica, add the new replica
and then delete the old one.

The disturbing bit is that the replicas for out of sync in the first place.
Anything in the logs that gives any clues as to why?

Best,
Erick

On Oct 14, 2016 18:32, "Davide Isoardi" <is...@ecubecenter.it> wrote:

> I am sorry for my typos. I have compared numdocs of  shard1_replica1 with
> shard1_replica2.
>
> If I  create another replica (replica3) and only after that I unload
> replica2, will the last replica be synchronized with replica1?
>
> Inviata dal mio Windows Phone
> ________________________________
> Da: Shawn Heisey<ma...@elyograg.org>
> Inviato: ‎14/‎10/‎2016 18:33
> A: solr-user@lucene.apache.org<ma...@lucene.apache.org>
> Oggetto: Re: R: Incongruent results of numdocs
>
> On 10/14/2016 9:43 AM, Davide Isoardi wrote:
> >
> > thank you very much for the quick answare.
> >
> >
> >
> > Yes, I am not indexing between request.
> >
> >
> >
> > How can I risync two or all replicas?
> >
> > If I look the overviews in the shard menu (attached the screenshot) I
> > see that the num docs are mismatched.
> >
> >
> >
> > Shard1_replica1
> >
> >
> > Shard2_replica2
> >
>
> I can't see those pictures, the attachments didn't make it.  You seem to
> be comparing shard1 and shard2.  That's not a valid comparison.  There's
> a very good chance that different shards will have different document
> counts even if everything is working correctly.  You need to compare
> replicas of shard1 to other replicas of shard1, shard2 to shard2, etc.
> They'll likely be on different servers.
>
> Probably the best way to force a resync is to shutdown a Solr instance,
> decide which replicas you want to delete on that instance, delete the
> data directory for those replicas, and start Solr back up.  Any replica
> where you delete the data directory will copy the index from the shard
> leader, and they'll be back in sync when the copy finishes.  Before you
> do this, make sure that you actually do have multiple replicas of each
> shard.
>
> Thanks,
> Shawn
>
>

R: R: Incongruent results of numdocs

Posted by Davide Isoardi <is...@ecubecenter.it>.
I am sorry for my typos. I have compared numdocs of  shard1_replica1 with shard1_replica2.

If I  create another replica (replica3) and only after that I unload replica2, will the last replica be synchronized with replica1?

Inviata dal mio Windows Phone
________________________________
Da: Shawn Heisey<ma...@elyograg.org>
Inviato: ‎14/‎10/‎2016 18:33
A: solr-user@lucene.apache.org<ma...@lucene.apache.org>
Oggetto: Re: R: Incongruent results of numdocs

On 10/14/2016 9:43 AM, Davide Isoardi wrote:
>
> thank you very much for the quick answare.
>
>
>
> Yes, I am not indexing between request.
>
>
>
> How can I risync two or all replicas?
>
> If I look the overviews in the shard menu (attached the screenshot) I
> see that the num docs are mismatched.
>
>
>
> Shard1_replica1
>
>
> Shard2_replica2
>

I can't see those pictures, the attachments didn't make it.  You seem to
be comparing shard1 and shard2.  That's not a valid comparison.  There's
a very good chance that different shards will have different document
counts even if everything is working correctly.  You need to compare
replicas of shard1 to other replicas of shard1, shard2 to shard2, etc.
They'll likely be on different servers.

Probably the best way to force a resync is to shutdown a Solr instance,
decide which replicas you want to delete on that instance, delete the
data directory for those replicas, and start Solr back up.  Any replica
where you delete the data directory will copy the index from the shard
leader, and they'll be back in sync when the copy finishes.  Before you
do this, make sure that you actually do have multiple replicas of each
shard.

Thanks,
Shawn


Re: R: Incongruent results of numdocs

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/14/2016 9:43 AM, Davide Isoardi wrote:
>
> thank you very much for the quick answare.
>
>  
>
> Yes, I am not indexing between request.
>
>  
>
> How can I risync two or all replicas?
>
> If I look the overviews in the shard menu (attached the screenshot) I
> see that the num docs are mismatched.
>
>  
>
> Shard1_replica1
>
>
> Shard2_replica2
>

I can't see those pictures, the attachments didn't make it.  You seem to
be comparing shard1 and shard2.  That's not a valid comparison.  There's
a very good chance that different shards will have different document
counts even if everything is working correctly.  You need to compare
replicas of shard1 to other replicas of shard1, shard2 to shard2, etc. 
They'll likely be on different servers.

Probably the best way to force a resync is to shutdown a Solr instance,
decide which replicas you want to delete on that instance, delete the
data directory for those replicas, and start Solr back up.  Any replica
where you delete the data directory will copy the index from the shard
leader, and they'll be back in sync when the copy finishes.  Before you
do this, make sure that you actually do have multiple replicas of each
shard.

Thanks,
Shawn


R: Incongruent results of numdocs

Posted by Davide Isoardi <is...@ecubecenter.it>.
thank you very much for the quick answare.



Yes, I am not indexing between request.



How can I risync two or all replicas?

If I look the overviews in the shard menu (attached the screenshot) I see that the num docs are mismatched.



[cid:image001.jpg@01D22642.62A0AD40]

Shard1_replica1

[cid:image002.jpg@01D22642.62A0AD40]

Shard2_replica2





Davide Isoardi

eCube S.r.l.

isoardi@ecubecenter.it

http://www.ecubecenter.it

Tel.  +390113999301

Mobile +393288204915

Fax. +390113999309





Informativa ai sensi del Decr.Lgs Privacy n.196/2003

ECUBE tratta i dati personali secondo quanto specificato nella pagina “Privacy Policy” disponibile su http://www.ecubecenter.it/privacy.pdf. Le informazioni contenute nel presente messaggio sono destinate esclusivamente al/ai destinatario/i indicato/i. Qualora riceviate il presente messaggio per errore, vi preghiamo di voler cortesemente darcene notizia via e-mail (info@ecubecenter.it) e di provvedere ad eliminare il messaggio ricevuto erroneamente, essendo illegittimo ed illecito ogni diverso utilizzo.





-----Messaggio originale-----

Da: Shawn Heisey [mailto:apache@elyograg.org]

Inviato: venerdì 14 ottobre 2016 14:32

A: solr-user@lucene.apache.org

Oggetto: Re: Incongruent results of numdocs



On 10/14/2016 3:35 AM, Davide Isoardi wrote:

> I have indexed more than 1 million of docs on a SolrCloud collections whit 5 shards and 2 replicas.

>

> After the indexing if I try to query (many times) q=id:*&rows=0 I have different result for the document number founds.

>

> Why the result is not the same for all querys?



Assuming that you are not indexing new documents between requests, there are two reasons for this problem:



1) You have documents with the same uniqueKey value in more than one of your shards.  This typically happens when the router on the collection is set to "implicit" ... which basically means "manual."

2) Your two replicas are out of sync, which might have any number of causes.



Side note:  "q=id:*" is a very inefficient query.  You would be better off with a range query -- "q=id:[* TO *]".  That would be faster and use less memory.  If the id field is your uniqueKey, then an even faster query and 100% equivalent query is the one for all docs -- "q=*:*".



Thanks,

Shawn



R: Incongruent results of numdocs

Posted by Davide Isoardi <is...@ecubecenter.it>.
thank you very much for the quick answare.



Yes, I am not indexing between request.



How can I risync two or all replicas?

If I look the overviews in the shard menu (attached the screenshot) I see that the num docs are mismatched.





Davide Isoardi

eCube S.r.l.

isoardi@ecubecenter.it<ma...@ecubecenter.it>

http://www.ecubecenter.it

Tel.  +390113999301

Mobile +393288204915

Fax. +390113999309





Informativa ai sensi del Decr.Lgs Privacy n.196/2003

ECUBE tratta i dati personali secondo quanto specificato nella pagina “Privacy Policy” disponibile su http://www.ecubecenter.it/privacy.pdf. Le informazioni contenute nel presente messaggio sono destinate esclusivamente al/ai destinatario/i indicato/i. Qualora riceviate il presente messaggio per errore, vi preghiamo di voler cortesemente darcene notizia via e-mail (info@ecubecenter.it<ma...@ecubecenter.it>) e di provvedere ad eliminare il messaggio ricevuto erroneamente, essendo illegittimo ed illecito ogni diverso utilizzo.





-----Messaggio originale-----

Da: Shawn Heisey [mailto:apache@elyograg.org]

Inviato: venerdì 14 ottobre 2016 14:32

A: solr-user@lucene.apache.org<ma...@lucene.apache.org>

Oggetto: Re: Incongruent results of numdocs



On 10/14/2016 3:35 AM, Davide Isoardi wrote:

> I have indexed more than 1 million of docs on a SolrCloud collections whit 5 shards and 2 replicas.

>

> After the indexing if I try to query (many times) q=id:*&rows=0 I have different result for the document number founds.

>

> Why the result is not the same for all querys?



Assuming that you are not indexing new documents between requests, there are two reasons for this problem:



1) You have documents with the same uniqueKey value in more than one of your shards.  This typically happens when the router on the collection is set to "implicit" ... which basically means "manual."

2) Your two replicas are out of sync, which might have any number of causes.



Side note:  "q=id:*" is a very inefficient query.  You would be better off with a range query -- "q=id:[* TO *]".  That would be faster and use less memory.  If the id field is your uniqueKey, then an even faster query and 100% equivalent query is the one for all docs -- "q=*:*".



Thanks,

Shawn



Re: Incongruent results of numdocs

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/14/2016 3:35 AM, Davide Isoardi wrote:
> I have indexed more than 1 million of docs on a SolrCloud collections whit 5 shards and 2 replicas.
>
> After the indexing if I try to query (many times) q=id:*&rows=0 I have different result for the document number founds.
>
> Why the result is not the same for all querys?

Assuming that you are not indexing new documents between requests, there
are two reasons for this problem:

1) You have documents with the same uniqueKey value in more than one of
your shards.  This typically happens when the router on the collection
is set to "implicit" ... which basically means "manual."
2) Your two replicas are out of sync, which might have any number of causes.

Side note:  "q=id:*" is a very inefficient query.  You would be better
off with a range query -- "q=id:[* TO *]".  That would be faster and use
less memory.  If the id field is your uniqueKey, then an even faster
query and 100% equivalent query is the one for all docs -- "q=*:*".

Thanks,
Shawn