You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "M. Flatterie" <ni...@yahoo.com> on 2014/02/03 15:57:44 UTC

SolrCloud query results order master vs replica

Greetings,

My setup is:
- SolrCloud V4.3
- On collection
- one shard
- 1 master, 1 replica

so each instance contains the entire index.  The index is rather small and the replica is used for robustness.  There is no need (IMHO) to split shard the index (yet, until the index gets bigger).

My question:
- if I do a query on a product name (that is what the index is about) on the master I get a certain number of results and the documents.
- if I do the same query on the replica, I get the same number of results but the docs are in a different order.
- I do not specify a sort parameter in my query, simply a q=<product name>.
- obviously if I force a sort order, everything is ok, same results, same order from both instances.
- am I wrong in expecting the same results, in the SAME order?

Follow up question if the order is not guaranteed:
- should I force the dev. to use an explicit sort order?
- if we force the sort, we then bypass the ranking / score order do we not?
- should I force all queries to go to the master and fall back on the replica only in the context of a total loss of the master?

Other useful information:
  - the admin page shows same number of documents in both instances.
  - logs are clean, load and replication and queries worked ok.
  - the web application that queries SOLR round robins between the two instances, so getting results in a different order is bad for consistency.

Thank you for your help!

Nic


Re: SolrCloud query results order master vs replica

Posted by "M. Flatterie" <ni...@yahoo.com>.
Thank you Sir for that confirmation!
Nic

--------------------------------------------
On Wed, 2/5/14, Chris Hostetter <ho...@fucit.org> wrote:

 Subject: Re: SolrCloud query results order master vs replica
 To: solr-user@lucene.apache.org
 Received: Wednesday, February 5, 2014, 11:33 AM
 
 
 : Just
 to make sure I interpret the results correctly:
 : - they all have a score of 1.7046129
 : - the order they are presented in is
 therefore not related to the score, 
 : it is
 just the order in which the data is internally stored (like
 an SQL 
 : SELECT statement without ORDER BY
 clause)
 
 The order they are
 presented *is* related to the score -- but since the 
 scores are all identical, and no secondary sort
 is specified, the behavior 
 is undefined --
 and can varry depending on the replica used.
 
 :   - If I want to force a sort
 operation, I should add a sort parameter 
 :
 in the query.  The first sort will be done by score and
 then documents 
 : with the same score will
 be sorted by my sort=?? paremeter?
 :   - or will my sort parameter
 overwrite the score sorting?
 
 if you specify a sort param, it should be the
 full sort you want -- it 
 won't be
 "appended" to the default score sort ... so if,
 for example, you 
 wanted to sort by score,
 with a secondary fallback sort by your "id" 
 field, use something like...
 
     sort=score desc, id
 asc
 
 
 
 -Hoss
 http://www.lucidworks.com/
 

Re: SolrCloud query results order master vs replica

Posted by Chris Hostetter <ho...@fucit.org>.
: Just to make sure I interpret the results correctly:
: - they all have a score of 1.7046129
: - the order they are presented in is therefore not related to the score, 
: it is just the order in which the data is internally stored (like an SQL 
: SELECT statement without ORDER BY clause)

The order they are presented *is* related to the score -- but since the 
scores are all identical, and no secondary sort is specified, the behavior 
is undefined -- and can varry depending on the replica used.

:   - If I want to force a sort operation, I should add a sort parameter 
: in the query.  The first sort will be done by score and then documents 
: with the same score will be sorted by my sort=?? paremeter?
:   - or will my sort parameter overwrite the score sorting?

if you specify a sort param, it should be the full sort you want -- it 
won't be "appended" to the default score sort ... so if, for example, you 
wanted to sort by score, with a secondary fallback sort by your "id" 
field, use something like...

	sort=score desc, id asc



-Hoss
http://www.lucidworks.com/

Re: SolrCloud query results order master vs replica

Posted by "M. Flatterie" <ni...@yahoo.com>.
Good morning, so based on your answer, there is no garantee that the results will be the same from one replica to the other.

I ran the queries in debug mode and I see...

MASTER

 "321240": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20206) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 20206, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=20206)\n",
      "432633": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20457) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 20457, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=20457)\n",
      "321166": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 23414) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 23414, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=23414)\n",
      "362806": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 25531) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 25531, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=25531)\n",
      "684662": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 27656) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 27656, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=27656)\n",
      "425926": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 28662) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 28662, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=28662)\n",
      "718098": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 44509) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 44509, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=44509)\n",
      "527929": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 53653) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 53653, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=53653)\n",
      "138537": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 56137) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 56137, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=56137)\n",
      "633800": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 67368) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 67368, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=67368)\n"
    
    
REPLICA    
    
      "111294": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 4803) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 4803, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=4803)\n",
      "164137": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 4878) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 4878, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=4878)\n",
      "553503": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 6907) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 6907, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=6907)\n",
      "684621": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 12453) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 12453, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=12453)\n",
      "674028": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 15029) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 15029, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=15029)\n",
      "563023": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 15698) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 15698, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=15698)\n",
      "894824": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 19256) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 19256, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=19256)\n",
      "540476": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20843) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 20843, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=20843)\n",
      "671271": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 23778) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 23778, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=23778)\n",
      "527929": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 25053) [DefaultSimilarity], result of:\n  1.7046129 = fieldWeight in 25053, product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 = termFreq=1.0\n    6.8184514 = idf(docFreq=374, maxDocs=126169)\n    0.25 = fieldNorm(doc=25053)\n"



Just to make sure I interpret the results correctly:
- they all have a score of 1.7046129
- the order they are presented in is therefore not related to the score, it is just the order in which the data is internally stored (like an SQL SELECT statement without ORDER BY clause)

Follow up question:
  - If I want to force a sort operation, I should add a sort parameter in the query.  The first sort will be done by score and then documents with the same score will be sorted by my sort=?? paremeter?
  - or will my sort parameter overwrite the score sorting?

Thank you again for your help,

Nic.



--------------------------------------------
On Mon, 2/3/14, Erick Erickson <er...@gmail.com> wrote:

 Subject: Re: SolrCloud query results order master vs replica
 To: solr-user@lucene.apache.org
 Received: Monday, February 3, 2014, 2:19 PM
 
 This should only be
 happening if the scores are _exactly_ the same,
 which is actually
 quite rare.
 In that case, the tied scores are broken by the internal
 Lucene document
 ID, and the
 relative order of the docs on the two machines isn't
 guaranteed to be the
 same, the
 internal ID can change during segment merging, which is NOT
 the same
 on both machines.
 
 But this should be relatively
 rare. If you're doing *:* queries or
 other such, then they
 aren't scored (see ConstantScoreQuery). So
 in practical terms, I suspect you're
 seeing some kind of test artifact. Try adding
 &debug=all to the query
 and you'll
 see
 how documents are scored.
 
 Best,
 Erick
 
 On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie
 <ni...@yahoo.com>
 wrote:
 > Greetings,
 >
 > My setup is:
 > - SolrCloud V4.3
 > - On
 collection
 > - one shard
 > - 1 master, 1 replica
 >
 > so each instance
 contains the entire index.  The index is rather small and
 the replica is used for robustness.  There is no need
 (IMHO) to split shard the index (yet, until the index gets
 bigger).
 >
 > My
 question:
 > - if I do a query on a
 product name (that is what the index is about) on the master
 I get a certain number of results and the documents.
 > - if I do the same query on the replica, I
 get the same number of results but the docs are in a
 different order.
 > - I do not specify a
 sort parameter in my query, simply a q=<product
 name>.
 > - obviously if I force a sort
 order, everything is ok, same results, same order from both
 instances.
 > - am I wrong in expecting
 the same results, in the SAME order?
 >
 > Follow up question if the order is not
 guaranteed:
 > - should I force the dev.
 to use an explicit sort order?
 > - if we
 force the sort, we then bypass the ranking / score order do
 we not?
 > - should I force all queries to
 go to the master and fall back on the replica only in the
 context of a total loss of the master?
 >
 > Other useful
 information:
 >   - the admin
 page shows same number of documents in both instances.
 >   - logs are clean, load and
 replication and queries worked ok.
 >   - the web application that
 queries SOLR round robins between the two instances, so
 getting results in a different order is bad for
 consistency.
 >
 > Thank
 you for your help!
 >
 >
 Nic
 >
 

Re: SolrCloud query results order master vs replica

Posted by Erick Erickson <er...@gmail.com>.
This should only be happening if the scores are _exactly_ the same,
which is actually
quite rare. In that case, the tied scores are broken by the internal
Lucene document
ID, and the relative order of the docs on the two machines isn't
guaranteed to be the
same, the internal ID can change during segment merging, which is NOT the same
on both machines.

But this should be relatively rare. If you're doing *:* queries or
other such, then they
aren't scored (see ConstantScoreQuery). So in practical terms, I suspect you're
seeing some kind of test artifact. Try adding &debug=all to the query
and you'll see
how documents are scored.

Best,
Erick

On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie <ni...@yahoo.com> wrote:
> Greetings,
>
> My setup is:
> - SolrCloud V4.3
> - On collection
> - one shard
> - 1 master, 1 replica
>
> so each instance contains the entire index.  The index is rather small and the replica is used for robustness.  There is no need (IMHO) to split shard the index (yet, until the index gets bigger).
>
> My question:
> - if I do a query on a product name (that is what the index is about) on the master I get a certain number of results and the documents.
> - if I do the same query on the replica, I get the same number of results but the docs are in a different order.
> - I do not specify a sort parameter in my query, simply a q=<product name>.
> - obviously if I force a sort order, everything is ok, same results, same order from both instances.
> - am I wrong in expecting the same results, in the SAME order?
>
> Follow up question if the order is not guaranteed:
> - should I force the dev. to use an explicit sort order?
> - if we force the sort, we then bypass the ranking / score order do we not?
> - should I force all queries to go to the master and fall back on the replica only in the context of a total loss of the master?
>
> Other useful information:
>   - the admin page shows same number of documents in both instances.
>   - logs are clean, load and replication and queries worked ok.
>   - the web application that queries SOLR round robins between the two instances, so getting results in a different order is bad for consistency.
>
> Thank you for your help!
>
> Nic
>