You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "M. Flatterie" <ni...@yahoo.com> on 2014/02/03 15:57:44 UTC
SolrCloud query results order master vs replica
Greetings,
My setup is:
- SolrCloud V4.3
- On collection
- one shard
- 1 master, 1 replica
so each instance contains the entire index. The index is rather small and the replica is used for robustness. There is no need (IMHO) to split shard the index (yet, until the index gets bigger).
My question:
- if I do a query on a product name (that is what the index is about) on the master I get a certain number of results and the documents.
- if I do the same query on the replica, I get the same number of results but the docs are in a different order.
- I do not specify a sort parameter in my query, simply a q=<product name>.
- obviously if I force a sort order, everything is ok, same results, same order from both instances.
- am I wrong in expecting the same results, in the SAME order?
Follow up question if the order is not guaranteed:
- should I force the dev. to use an explicit sort order?
- if we force the sort, we then bypass the ranking / score order do we not?
- should I force all queries to go to the master and fall back on the replica only in the context of a total loss of the master?
Other useful information:
- the admin page shows same number of documents in both instances.
- logs are clean, load and replication and queries worked ok.
- the web application that queries SOLR round robins between the two instances, so getting results in a different order is bad for consistency.
Thank you for your help!
Nic
Re: SolrCloud query results order master vs replica
Posted by "M. Flatterie" <ni...@yahoo.com>.
Thank you Sir for that confirmation!
Nic
--------------------------------------------
On Wed, 2/5/14, Chris Hostetter <ho...@fucit.org> wrote:
Subject: Re: SolrCloud query results order master vs replica
To: solr-user@lucene.apache.org
Received: Wednesday, February 5, 2014, 11:33 AM
: Just
to make sure I interpret the results correctly:
: - they all have a score of 1.7046129
: - the order they are presented in is
therefore not related to the score,
: it is
just the order in which the data is internally stored (like
an SQL
: SELECT statement without ORDER BY
clause)
The order they are
presented *is* related to the score -- but since the
scores are all identical, and no secondary sort
is specified, the behavior
is undefined --
and can varry depending on the replica used.
: - If I want to force a sort
operation, I should add a sort parameter
:
in the query. The first sort will be done by score and
then documents
: with the same score will
be sorted by my sort=?? paremeter?
: - or will my sort parameter
overwrite the score sorting?
if you specify a sort param, it should be the
full sort you want -- it
won't be
"appended" to the default score sort ... so if,
for example, you
wanted to sort by score,
with a secondary fallback sort by your "id"
field, use something like...
sort=score desc, id
asc
-Hoss
http://www.lucidworks.com/
Re: SolrCloud query results order master vs replica
Posted by Chris Hostetter <ho...@fucit.org>.
: Just to make sure I interpret the results correctly:
: - they all have a score of 1.7046129
: - the order they are presented in is therefore not related to the score,
: it is just the order in which the data is internally stored (like an SQL
: SELECT statement without ORDER BY clause)
The order they are presented *is* related to the score -- but since the
scores are all identical, and no secondary sort is specified, the behavior
is undefined -- and can varry depending on the replica used.
: - If I want to force a sort operation, I should add a sort parameter
: in the query. The first sort will be done by score and then documents
: with the same score will be sorted by my sort=?? paremeter?
: - or will my sort parameter overwrite the score sorting?
if you specify a sort param, it should be the full sort you want -- it
won't be "appended" to the default score sort ... so if, for example, you
wanted to sort by score, with a secondary fallback sort by your "id"
field, use something like...
sort=score desc, id asc
-Hoss
http://www.lucidworks.com/
Re: SolrCloud query results order master vs replica
Posted by "M. Flatterie" <ni...@yahoo.com>.
Good morning, so based on your answer, there is no garantee that the results will be the same from one replica to the other.
I ran the queries in debug mode and I see...
MASTER
"321240": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20206) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 20206, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=20206)\n",
"432633": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20457) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 20457, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=20457)\n",
"321166": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 23414) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 23414, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=23414)\n",
"362806": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 25531) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 25531, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=25531)\n",
"684662": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 27656) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 27656, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=27656)\n",
"425926": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 28662) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 28662, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=28662)\n",
"718098": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 44509) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 44509, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=44509)\n",
"527929": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 53653) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 53653, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=53653)\n",
"138537": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 56137) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 56137, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=56137)\n",
"633800": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 67368) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 67368, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=67368)\n"
REPLICA
"111294": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 4803) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 4803, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=4803)\n",
"164137": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 4878) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 4878, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=4878)\n",
"553503": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 6907) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 6907, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=6907)\n",
"684621": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 12453) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 12453, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=12453)\n",
"674028": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 15029) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 15029, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=15029)\n",
"563023": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 15698) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 15698, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=15698)\n",
"894824": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 19256) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 19256, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=19256)\n",
"540476": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20843) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 20843, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=20843)\n",
"671271": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 23778) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 23778, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=23778)\n",
"527929": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 25053) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 25053, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n 0.25 = fieldNorm(doc=25053)\n"
Just to make sure I interpret the results correctly:
- they all have a score of 1.7046129
- the order they are presented in is therefore not related to the score, it is just the order in which the data is internally stored (like an SQL SELECT statement without ORDER BY clause)
Follow up question:
- If I want to force a sort operation, I should add a sort parameter in the query. The first sort will be done by score and then documents with the same score will be sorted by my sort=?? paremeter?
- or will my sort parameter overwrite the score sorting?
Thank you again for your help,
Nic.
--------------------------------------------
On Mon, 2/3/14, Erick Erickson <er...@gmail.com> wrote:
Subject: Re: SolrCloud query results order master vs replica
To: solr-user@lucene.apache.org
Received: Monday, February 3, 2014, 2:19 PM
This should only be
happening if the scores are _exactly_ the same,
which is actually
quite rare.
In that case, the tied scores are broken by the internal
Lucene document
ID, and the
relative order of the docs on the two machines isn't
guaranteed to be the
same, the
internal ID can change during segment merging, which is NOT
the same
on both machines.
But this should be relatively
rare. If you're doing *:* queries or
other such, then they
aren't scored (see ConstantScoreQuery). So
in practical terms, I suspect you're
seeing some kind of test artifact. Try adding
&debug=all to the query
and you'll
see
how documents are scored.
Best,
Erick
On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie
<ni...@yahoo.com>
wrote:
> Greetings,
>
> My setup is:
> - SolrCloud V4.3
> - On
collection
> - one shard
> - 1 master, 1 replica
>
> so each instance
contains the entire index. The index is rather small and
the replica is used for robustness. There is no need
(IMHO) to split shard the index (yet, until the index gets
bigger).
>
> My
question:
> - if I do a query on a
product name (that is what the index is about) on the master
I get a certain number of results and the documents.
> - if I do the same query on the replica, I
get the same number of results but the docs are in a
different order.
> - I do not specify a
sort parameter in my query, simply a q=<product
name>.
> - obviously if I force a sort
order, everything is ok, same results, same order from both
instances.
> - am I wrong in expecting
the same results, in the SAME order?
>
> Follow up question if the order is not
guaranteed:
> - should I force the dev.
to use an explicit sort order?
> - if we
force the sort, we then bypass the ranking / score order do
we not?
> - should I force all queries to
go to the master and fall back on the replica only in the
context of a total loss of the master?
>
> Other useful
information:
> - the admin
page shows same number of documents in both instances.
> - logs are clean, load and
replication and queries worked ok.
> - the web application that
queries SOLR round robins between the two instances, so
getting results in a different order is bad for
consistency.
>
> Thank
you for your help!
>
>
Nic
>
Re: SolrCloud query results order master vs replica
Posted by Erick Erickson <er...@gmail.com>.
This should only be happening if the scores are _exactly_ the same,
which is actually
quite rare. In that case, the tied scores are broken by the internal
Lucene document
ID, and the relative order of the docs on the two machines isn't
guaranteed to be the
same, the internal ID can change during segment merging, which is NOT the same
on both machines.
But this should be relatively rare. If you're doing *:* queries or
other such, then they
aren't scored (see ConstantScoreQuery). So in practical terms, I suspect you're
seeing some kind of test artifact. Try adding &debug=all to the query
and you'll see
how documents are scored.
Best,
Erick
On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie <ni...@yahoo.com> wrote:
> Greetings,
>
> My setup is:
> - SolrCloud V4.3
> - On collection
> - one shard
> - 1 master, 1 replica
>
> so each instance contains the entire index. The index is rather small and the replica is used for robustness. There is no need (IMHO) to split shard the index (yet, until the index gets bigger).
>
> My question:
> - if I do a query on a product name (that is what the index is about) on the master I get a certain number of results and the documents.
> - if I do the same query on the replica, I get the same number of results but the docs are in a different order.
> - I do not specify a sort parameter in my query, simply a q=<product name>.
> - obviously if I force a sort order, everything is ok, same results, same order from both instances.
> - am I wrong in expecting the same results, in the SAME order?
>
> Follow up question if the order is not guaranteed:
> - should I force the dev. to use an explicit sort order?
> - if we force the sort, we then bypass the ranking / score order do we not?
> - should I force all queries to go to the master and fall back on the replica only in the context of a total loss of the master?
>
> Other useful information:
> - the admin page shows same number of documents in both instances.
> - logs are clean, load and replication and queries worked ok.
> - the web application that queries SOLR round robins between the two instances, so getting results in a different order is bad for consistency.
>
> Thank you for your help!
>
> Nic
>