You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Allistair Crossley <al...@roxxor.co.uk> on 2011/03/09 22:22:33 UTC

Same index is ranking differently on 2 machines

Hi,

I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.

I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.

The index is populated exclusively via data import handler queries to a database. 

I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.

I execute a total full-import on both.

Still, I see a different position for this document that should surely rank in the same location, all else being equal.

I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.

As far as I can tell every single query normalisation block of the debug is marginally different, e.g.

-        0.021368012 = queryNorm (local)
+        0.009944122 = queryNorm (production)

Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).

- -2.286596 (local)
+1.0651637 = (production)

I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?

Thank you for your time

Allistair

----- snip

APPENDIX - debugQuery=on DIFF 

--- untitled
+++ (clipboard)
@@ -1,51 +1,49 @@
-<str name="L12411p">
+<str name="L12411">
 
-2.286596 = (MATCH) sum of:
-  1.6891675 = (MATCH) sum of:
-    1.3198489 = (MATCH) max plus 0.01 times others of:
-      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
-        0.011795795 = queryWeight(text:dubai^0.1), product of:
-          0.1 = boost
+1.0651637 = (MATCH) sum of:
+  0.7871359 = (MATCH) sum of:
+    0.6151879 = (MATCH) max plus 0.01 times others of:
+      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
+        0.05489459 = queryWeight(text:dubai), product of:
           5.520305 = idf(docFreq=65, maxDocs=6063)
-          0.021368012 = queryNorm
+          0.009944122 = queryNorm
         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
           1.4142135 = tf(termFreq(text:dubai)=2)
           5.520305 = idf(docFreq=65, maxDocs=6063)
           0.25 = fieldNorm(field=text, doc=1551)
-      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
-        0.32609802 = queryWeight(profile:dubai^2.0), product of:
+      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
+        0.15175761 = queryWeight(profile:dubai^2.0), product of:
           2.0 = boost
           7.6305184 = idf(docFreq=7, maxDocs=6063)
-          0.021368012 = queryNorm
+          0.009944122 = queryNorm
         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
           1.4142135 = tf(termFreq(profile:dubai)=2)
           7.6305184 = idf(docFreq=7, maxDocs=6063)
           0.375 = fieldNorm(field=profile, doc=1551)
-    0.36931866 = (MATCH) max plus 0.01 times others of:
-      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
-        0.003954251 = queryWeight(text:product^0.1), product of:
-          0.1 = boost
+    0.17194802 = (MATCH) max plus 0.01 times others of:
+      0.00851347 = (MATCH) weight(text:product in 1551), product of:
+        0.018402064 = queryWeight(text:product), product of:
           1.8505468 = idf(docFreq=2589, maxDocs=6063)
-          0.021368012 = queryNorm
+          0.009944122 = queryNorm
         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
           1.0 = tf(termFreq(text:product)=1)
           1.8505468 = idf(docFreq=2589, maxDocs=6063)
           0.25 = fieldNorm(field=text, doc=1551)
-      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
-        0.1725098 = queryWeight(profile:product^2.0), product of:
+      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
+        0.08028162 = queryWeight(profile:product^2.0), product of:
           2.0 = boost
           4.036637 = idf(docFreq=290, maxDocs=6063)
-          0.021368012 = queryNorm
+          0.009944122 = queryNorm
         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
           1.4142135 = tf(termFreq(profile:product)=2)
           4.036637 = idf(docFreq=290, maxDocs=6063)
           0.375 = fieldNorm(field=profile, doc=1551)
-  0.59742856 = (MATCH) max plus 0.01 times others of:
-    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
-      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
+  0.27802786 = (MATCH) max plus 0.01 times others of:
+    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
+      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
         0.5 = boost
         11.667155 = idf(profile: dubai=7 product=290)
-        0.021368012 = queryNorm
+        0.009944122 = queryNorm
       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
         1.0954452 = tf(phraseFreq=1.2)
         11.667155 = idf(profile: dubai=7 product=290)




Re: Same index is ranking differently on 2 machines

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Wait, if you don't have identical indexes, then why would you expect 
identical results?

If your indexes are different, one would expect the results for the same 
query to be different -- there are different documents in the index!   
The iDF portion of the TF/iDF type algorithm at the base of Solr's 
relevancy will also be different in different indexes. 
http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Maybe I'm misunderstanding you.  But if you have different indexes -- 
not exactly the same collection of documents indexed using exactly the 
same field definitions and rules -- then one should expect different 
relevance results.

Jonathan

On 3/9/2011 4:48 PM, Allistair Crossley wrote:
> That's what I think, glad I am not going mad.
>
> I've spent 1/2 a day comparing the config files, checking out from SVN again and ensuring the databases are identical. I cannot see what else I can do to make them equivalent. Both servers checkout directly from SVN, I am convinced the files are the same. The database is definately the same.
>
> Not sure what you mean about having identical indices - that's my problem - I don't - or do you mean something else I've missed? But yes everything else you mention is identical, I am as certain as I can be.
>
> I too think there must be a difference I have missed but I have run out of ideas for what to check!
>
> Frustrating :)
>
> On Mar 9, 2011, at 4:38 PM, Jonathan Rochkind wrote:
>
>> Yes, but the identical index with the identical solrconfig.xml and the identical query and the identical version of Solr on two different machines should preduce identical results.
>>
>> So it's a legitimate question why it's not.  But perhaps queryNorm isn't enough to answer that. Sorry, it's out of my league to try and figure out it out.
>>
>> But are you absolutely sure you have identical indexes, identical solrconfig.xml, identical queries, and identical versions of Solr and any other installed Java libraries... on both machines?  One of these being different seems more likely than a bug in Solr, although that's possible.
>>
>> On 3/9/2011 4:34 PM, Jayendra Patil wrote:
>>> queryNorm is just a normalizing factor and is the same value across
>>> all the results for a query, to just make the scores comparable.
>>> So even if it varies in different environment, you should not worried about.
>>>
>>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
>>> -
>>> Defination - queryNorm(q) is just a normalizing factor used to make
>>> scores between queries comparable. This factor does not affect
>>> document ranking (since all ranked documents are multiplied by the
>>> same factor), but rather just attempts to make scores from different
>>> queries (or even different indexes) comparable
>>>
>>> Regards,
>>> Jayendra
>>>
>>> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley<al...@roxxor.co.uk>   wrote:
>>>> Hi,
>>>>
>>>> I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.
>>>>
>>>> I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.
>>>>
>>>> The index is populated exclusively via data import handler queries to a database.
>>>>
>>>> I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.
>>>>
>>>> I execute a total full-import on both.
>>>>
>>>> Still, I see a different position for this document that should surely rank in the same location, all else being equal.
>>>>
>>>> I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.
>>>>
>>>> As far as I can tell every single query normalisation block of the debug is marginally different, e.g.
>>>>
>>>> -        0.021368012 = queryNorm (local)
>>>> +        0.009944122 = queryNorm (production)
>>>>
>>>> Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).
>>>>
>>>> - -2.286596 (local)
>>>> +1.0651637 = (production)
>>>>
>>>> I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?
>>>>
>>>> Thank you for your time
>>>>
>>>> Allistair
>>>>
>>>> ----- snip
>>>>
>>>> APPENDIX - debugQuery=on DIFF
>>>>
>>>> --- untitled
>>>> +++ (clipboard)
>>>> @@ -1,51 +1,49 @@
>>>> -<str name="L12411p">
>>>> +<str name="L12411">
>>>>
>>>> -2.286596 = (MATCH) sum of:
>>>> -  1.6891675 = (MATCH) sum of:
>>>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>>>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>>>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>>>> -          0.1 = boost
>>>> +1.0651637 = (MATCH) sum of:
>>>> +  0.7871359 = (MATCH) sum of:
>>>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>>>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>>>> +        0.05489459 = queryWeight(text:dubai), product of:
>>>>            5.520305 = idf(docFreq=65, maxDocs=6063)
>>>> -          0.021368012 = queryNorm
>>>> +          0.009944122 = queryNorm
>>>>          1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>>>            1.4142135 = tf(termFreq(text:dubai)=2)
>>>>            5.520305 = idf(docFreq=65, maxDocs=6063)
>>>>            0.25 = fieldNorm(field=text, doc=1551)
>>>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>>>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>>>            2.0 = boost
>>>>            7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>> -          0.021368012 = queryNorm
>>>> +          0.009944122 = queryNorm
>>>>          4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>>>            1.4142135 = tf(termFreq(profile:dubai)=2)
>>>>            7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>>            0.375 = fieldNorm(field=profile, doc=1551)
>>>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>>>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>>>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>>>> -          0.1 = boost
>>>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>>>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>>>> +        0.018402064 = queryWeight(text:product), product of:
>>>>            1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>> -          0.021368012 = queryNorm
>>>> +          0.009944122 = queryNorm
>>>>          0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>>>            1.0 = tf(termFreq(text:product)=1)
>>>>            1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>>            0.25 = fieldNorm(field=text, doc=1551)
>>>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>>>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>>>            2.0 = boost
>>>>            4.036637 = idf(docFreq=290, maxDocs=6063)
>>>> -          0.021368012 = queryNorm
>>>> +          0.009944122 = queryNorm
>>>>          2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>>>            1.4142135 = tf(termFreq(profile:product)=2)
>>>>            4.036637 = idf(docFreq=290, maxDocs=6063)
>>>>            0.375 = fieldNorm(field=profile, doc=1551)
>>>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>>>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>>>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>>          0.5 = boost
>>>>          11.667155 = idf(profile: dubai=7 product=290)
>>>> -        0.021368012 = queryNorm
>>>> +        0.009944122 = queryNorm
>>>>        4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>>>          1.0954452 = tf(phraseFreq=1.2)
>>>>          11.667155 = idf(profile: dubai=7 product=290)
>>>>
>>>>
>>>>
>>>>
>

Re: Same index is ranking differently on 2 machines

Posted by Allistair Crossley <al...@roxxor.co.uk>.
That's what I think, glad I am not going mad.

I've spent 1/2 a day comparing the config files, checking out from SVN again and ensuring the databases are identical. I cannot see what else I can do to make them equivalent. Both servers checkout directly from SVN, I am convinced the files are the same. The database is definately the same. 

Not sure what you mean about having identical indices - that's my problem - I don't - or do you mean something else I've missed? But yes everything else you mention is identical, I am as certain as I can be. 

I too think there must be a difference I have missed but I have run out of ideas for what to check!

Frustrating :)

On Mar 9, 2011, at 4:38 PM, Jonathan Rochkind wrote:

> Yes, but the identical index with the identical solrconfig.xml and the identical query and the identical version of Solr on two different machines should preduce identical results.
> 
> So it's a legitimate question why it's not.  But perhaps queryNorm isn't enough to answer that. Sorry, it's out of my league to try and figure out it out.
> 
> But are you absolutely sure you have identical indexes, identical solrconfig.xml, identical queries, and identical versions of Solr and any other installed Java libraries... on both machines?  One of these being different seems more likely than a bug in Solr, although that's possible.
> 
> On 3/9/2011 4:34 PM, Jayendra Patil wrote:
>> queryNorm is just a normalizing factor and is the same value across
>> all the results for a query, to just make the scores comparable.
>> So even if it varies in different environment, you should not worried about.
>> 
>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
>> -
>> Defination - queryNorm(q) is just a normalizing factor used to make
>> scores between queries comparable. This factor does not affect
>> document ranking (since all ranked documents are multiplied by the
>> same factor), but rather just attempts to make scores from different
>> queries (or even different indexes) comparable
>> 
>> Regards,
>> Jayendra
>> 
>> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley<al...@roxxor.co.uk>  wrote:
>>> Hi,
>>> 
>>> I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.
>>> 
>>> I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.
>>> 
>>> The index is populated exclusively via data import handler queries to a database.
>>> 
>>> I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.
>>> 
>>> I execute a total full-import on both.
>>> 
>>> Still, I see a different position for this document that should surely rank in the same location, all else being equal.
>>> 
>>> I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.
>>> 
>>> As far as I can tell every single query normalisation block of the debug is marginally different, e.g.
>>> 
>>> -        0.021368012 = queryNorm (local)
>>> +        0.009944122 = queryNorm (production)
>>> 
>>> Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).
>>> 
>>> - -2.286596 (local)
>>> +1.0651637 = (production)
>>> 
>>> I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?
>>> 
>>> Thank you for your time
>>> 
>>> Allistair
>>> 
>>> ----- snip
>>> 
>>> APPENDIX - debugQuery=on DIFF
>>> 
>>> --- untitled
>>> +++ (clipboard)
>>> @@ -1,51 +1,49 @@
>>> -<str name="L12411p">
>>> +<str name="L12411">
>>> 
>>> -2.286596 = (MATCH) sum of:
>>> -  1.6891675 = (MATCH) sum of:
>>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>>> -          0.1 = boost
>>> +1.0651637 = (MATCH) sum of:
>>> +  0.7871359 = (MATCH) sum of:
>>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>>> +        0.05489459 = queryWeight(text:dubai), product of:
>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>>           1.4142135 = tf(termFreq(text:dubai)=2)
>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>>           0.25 = fieldNorm(field=text, doc=1551)
>>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>>           2.0 = boost
>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>>           1.4142135 = tf(termFreq(profile:dubai)=2)
>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>>> -          0.1 = boost
>>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>>> +        0.018402064 = queryWeight(text:product), product of:
>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>>           1.0 = tf(termFreq(text:product)=1)
>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>           0.25 = fieldNorm(field=text, doc=1551)
>>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>>           2.0 = boost
>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>>           1.4142135 = tf(termFreq(profile:product)=2)
>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>         0.5 = boost
>>>         11.667155 = idf(profile: dubai=7 product=290)
>>> -        0.021368012 = queryNorm
>>> +        0.009944122 = queryNorm
>>>       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>>         1.0954452 = tf(phraseFreq=1.2)
>>>         11.667155 = idf(profile: dubai=7 product=290)
>>> 
>>> 
>>> 
>>> 


Re: Same index is ranking differently on 2 machines

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Yes, but the identical index with the identical solrconfig.xml and the 
identical query and the identical version of Solr on two different 
machines should preduce identical results.

So it's a legitimate question why it's not.  But perhaps queryNorm isn't 
enough to answer that. Sorry, it's out of my league to try and figure 
out it out.

But are you absolutely sure you have identical indexes, identical 
solrconfig.xml, identical queries, and identical versions of Solr and 
any other installed Java libraries... on both machines?  One of these 
being different seems more likely than a bug in Solr, although that's 
possible.

On 3/9/2011 4:34 PM, Jayendra Patil wrote:
> queryNorm is just a normalizing factor and is the same value across
> all the results for a query, to just make the scores comparable.
> So even if it varies in different environment, you should not worried about.
>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
> -
> Defination - queryNorm(q) is just a normalizing factor used to make
> scores between queries comparable. This factor does not affect
> document ranking (since all ranked documents are multiplied by the
> same factor), but rather just attempts to make scores from different
> queries (or even different indexes) comparable
>
> Regards,
> Jayendra
>
> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley<al...@roxxor.co.uk>  wrote:
>> Hi,
>>
>> I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.
>>
>> I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.
>>
>> The index is populated exclusively via data import handler queries to a database.
>>
>> I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.
>>
>> I execute a total full-import on both.
>>
>> Still, I see a different position for this document that should surely rank in the same location, all else being equal.
>>
>> I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.
>>
>> As far as I can tell every single query normalisation block of the debug is marginally different, e.g.
>>
>> -        0.021368012 = queryNorm (local)
>> +        0.009944122 = queryNorm (production)
>>
>> Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).
>>
>> - -2.286596 (local)
>> +1.0651637 = (production)
>>
>> I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?
>>
>> Thank you for your time
>>
>> Allistair
>>
>> ----- snip
>>
>> APPENDIX - debugQuery=on DIFF
>>
>> --- untitled
>> +++ (clipboard)
>> @@ -1,51 +1,49 @@
>> -<str name="L12411p">
>> +<str name="L12411">
>>
>> -2.286596 = (MATCH) sum of:
>> -  1.6891675 = (MATCH) sum of:
>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>> -          0.1 = boost
>> +1.0651637 = (MATCH) sum of:
>> +  0.7871359 = (MATCH) sum of:
>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>> +        0.05489459 = queryWeight(text:dubai), product of:
>>            5.520305 = idf(docFreq=65, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>          1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>            1.4142135 = tf(termFreq(text:dubai)=2)
>>            5.520305 = idf(docFreq=65, maxDocs=6063)
>>            0.25 = fieldNorm(field=text, doc=1551)
>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>            2.0 = boost
>>            7.6305184 = idf(docFreq=7, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>          4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>            1.4142135 = tf(termFreq(profile:dubai)=2)
>>            7.6305184 = idf(docFreq=7, maxDocs=6063)
>>            0.375 = fieldNorm(field=profile, doc=1551)
>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>> -          0.1 = boost
>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>> +        0.018402064 = queryWeight(text:product), product of:
>>            1.8505468 = idf(docFreq=2589, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>          0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>            1.0 = tf(termFreq(text:product)=1)
>>            1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>            0.25 = fieldNorm(field=text, doc=1551)
>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>            2.0 = boost
>>            4.036637 = idf(docFreq=290, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>          2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>            1.4142135 = tf(termFreq(profile:product)=2)
>>            4.036637 = idf(docFreq=290, maxDocs=6063)
>>            0.375 = fieldNorm(field=profile, doc=1551)
>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>          0.5 = boost
>>          11.667155 = idf(profile: dubai=7 product=290)
>> -        0.021368012 = queryNorm
>> +        0.009944122 = queryNorm
>>        4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>          1.0954452 = tf(phraseFreq=1.2)
>>          11.667155 = idf(profile: dubai=7 product=290)
>>
>>
>>
>>

Re: Same index is ranking differently on 2 machines

Posted by Allistair Crossley <al...@roxxor.co.uk>.
Oh wow, how did I miss that?

My apologies to anyone who read this post. I should have diffed my custom dismax handler. Looks like my SVN merge didn't work properly.

Embarassing.

Thanks everyone ;)

On Mar 9, 2011, at 4:51 PM, Yonik Seeley wrote:

> On Wed, Mar 9, 2011 at 4:49 PM, Jayendra Patil
> <ja...@gmail.com> wrote:
>> Are you sure you have the same config ...
>> The boost seems different for the field text - text:dubai^0.1 & text:dubai
> 
> Yep...
> Try adding echoParams=all and see all the parameters solr is acting on.
> http://wiki.apache.org/solr/CoreQueryParameters#echoParams
> 
> -Yonik
> http://lucidimagination.com
> 
> 
>> -2.286596 = (MATCH) sum of:
>> -  1.6891675 = (MATCH) sum of:
>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>> -          0.1 = boost
>> +1.0651637 = (MATCH) sum of:
>> +  0.7871359 = (MATCH) sum of:
>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>> +        0.05489459 = queryWeight(text:dubai), product of:
>> 
>> Regards,
>> Jayendra
>> 
>> On Wed, Mar 9, 2011 at 4:38 PM, Allistair Crossley <al...@roxxor.co.uk> wrote:
>>> Thanks. Good to know, but even so my problem remains - the end score should not be different and is causing a dramatically different ranking of a document (3 versus 7 is dramatic for my client). This must be down to the scoring debug differences - it's the only difference I can find :(
>>> 
>>> On Mar 9, 2011, at 4:34 PM, Jayendra Patil wrote:
>>> 
>>>> queryNorm is just a normalizing factor and is the same value across
>>>> all the results for a query, to just make the scores comparable.
>>>> So even if it varies in different environment, you should not worried about.
>>>> 
>>>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
>>>> -
>>>> Defination - queryNorm(q) is just a normalizing factor used to make
>>>> scores between queries comparable. This factor does not affect
>>>> document ranking (since all ranked documents are multiplied by the
>>>> same factor), but rather just attempts to make scores from different
>>>> queries (or even different indexes) comparable
>>>> 
>>>> Regards,
>>>> Jayendra
>>>> 
>>>> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley <al...@roxxor.co.uk> wrote:
>>>>> Hi,
>>>>> 
>>>>> I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.
>>>>> 
>>>>> I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.
>>>>> 
>>>>> The index is populated exclusively via data import handler queries to a database.
>>>>> 
>>>>> I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.
>>>>> 
>>>>> I execute a total full-import on both.
>>>>> 
>>>>> Still, I see a different position for this document that should surely rank in the same location, all else being equal.
>>>>> 
>>>>> I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.
>>>>> 
>>>>> As far as I can tell every single query normalisation block of the debug is marginally different, e.g.
>>>>> 
>>>>> -        0.021368012 = queryNorm (local)
>>>>> +        0.009944122 = queryNorm (production)
>>>>> 
>>>>> Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).
>>>>> 
>>>>> - -2.286596 (local)
>>>>> +1.0651637 = (production)
>>>>> 
>>>>> I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?
>>>>> 
>>>>> Thank you for your time
>>>>> 
>>>>> Allistair
>>>>> 
>>>>> ----- snip
>>>>> 
>>>>> APPENDIX - debugQuery=on DIFF
>>>>> 
>>>>> --- untitled
>>>>> +++ (clipboard)
>>>>> @@ -1,51 +1,49 @@
>>>>> -<str name="L12411p">
>>>>> +<str name="L12411">
>>>>> 
>>>>> -2.286596 = (MATCH) sum of:
>>>>> -  1.6891675 = (MATCH) sum of:
>>>>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>>>>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>>>>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>>>>> -          0.1 = boost
>>>>> +1.0651637 = (MATCH) sum of:
>>>>> +  0.7871359 = (MATCH) sum of:
>>>>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>>>>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>>>>> +        0.05489459 = queryWeight(text:dubai), product of:
>>>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>>>> -          0.021368012 = queryNorm
>>>>> +          0.009944122 = queryNorm
>>>>>         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>>>>           1.4142135 = tf(termFreq(text:dubai)=2)
>>>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>>>>           0.25 = fieldNorm(field=text, doc=1551)
>>>>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>>>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>>>>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>>>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>>>>           2.0 = boost
>>>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>>> -          0.021368012 = queryNorm
>>>>> +          0.009944122 = queryNorm
>>>>>         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>>>>           1.4142135 = tf(termFreq(profile:dubai)=2)
>>>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>>>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>>>>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>>>>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>>>>> -          0.1 = boost
>>>>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>>>>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>>>>> +        0.018402064 = queryWeight(text:product), product of:
>>>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>>> -          0.021368012 = queryNorm
>>>>> +          0.009944122 = queryNorm
>>>>>         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>>>>           1.0 = tf(termFreq(text:product)=1)
>>>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>>>           0.25 = fieldNorm(field=text, doc=1551)
>>>>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>>>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>>>>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>>>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>>>>           2.0 = boost
>>>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>>>> -          0.021368012 = queryNorm
>>>>> +          0.009944122 = queryNorm
>>>>>         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>>>>           1.4142135 = tf(termFreq(profile:product)=2)
>>>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>>>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>>>>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>>>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>>>>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>>>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>>>         0.5 = boost
>>>>>         11.667155 = idf(profile: dubai=7 product=290)
>>>>> -        0.021368012 = queryNorm
>>>>> +        0.009944122 = queryNorm
>>>>>       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>>>>         1.0954452 = tf(phraseFreq=1.2)
>>>>>         11.667155 = idf(profile: dubai=7 product=290)
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: Same index is ranking differently on 2 machines

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Wed, Mar 9, 2011 at 4:49 PM, Jayendra Patil
<ja...@gmail.com> wrote:
> Are you sure you have the same config ...
> The boost seems different for the field text - text:dubai^0.1 & text:dubai

Yep...
Try adding echoParams=all and see all the parameters solr is acting on.
http://wiki.apache.org/solr/CoreQueryParameters#echoParams

-Yonik
http://lucidimagination.com


> -2.286596 = (MATCH) sum of:
> -  1.6891675 = (MATCH) sum of:
> -    1.3198489 = (MATCH) max plus 0.01 times others of:
> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
> -          0.1 = boost
> +1.0651637 = (MATCH) sum of:
> +  0.7871359 = (MATCH) sum of:
> +    0.6151879 = (MATCH) max plus 0.01 times others of:
> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
> +        0.05489459 = queryWeight(text:dubai), product of:
>
> Regards,
> Jayendra
>
> On Wed, Mar 9, 2011 at 4:38 PM, Allistair Crossley <al...@roxxor.co.uk> wrote:
>> Thanks. Good to know, but even so my problem remains - the end score should not be different and is causing a dramatically different ranking of a document (3 versus 7 is dramatic for my client). This must be down to the scoring debug differences - it's the only difference I can find :(
>>
>> On Mar 9, 2011, at 4:34 PM, Jayendra Patil wrote:
>>
>>> queryNorm is just a normalizing factor and is the same value across
>>> all the results for a query, to just make the scores comparable.
>>> So even if it varies in different environment, you should not worried about.
>>>
>>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
>>> -
>>> Defination - queryNorm(q) is just a normalizing factor used to make
>>> scores between queries comparable. This factor does not affect
>>> document ranking (since all ranked documents are multiplied by the
>>> same factor), but rather just attempts to make scores from different
>>> queries (or even different indexes) comparable
>>>
>>> Regards,
>>> Jayendra
>>>
>>> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley <al...@roxxor.co.uk> wrote:
>>>> Hi,
>>>>
>>>> I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.
>>>>
>>>> I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.
>>>>
>>>> The index is populated exclusively via data import handler queries to a database.
>>>>
>>>> I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.
>>>>
>>>> I execute a total full-import on both.
>>>>
>>>> Still, I see a different position for this document that should surely rank in the same location, all else being equal.
>>>>
>>>> I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.
>>>>
>>>> As far as I can tell every single query normalisation block of the debug is marginally different, e.g.
>>>>
>>>> -        0.021368012 = queryNorm (local)
>>>> +        0.009944122 = queryNorm (production)
>>>>
>>>> Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).
>>>>
>>>> - -2.286596 (local)
>>>> +1.0651637 = (production)
>>>>
>>>> I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?
>>>>
>>>> Thank you for your time
>>>>
>>>> Allistair
>>>>
>>>> ----- snip
>>>>
>>>> APPENDIX - debugQuery=on DIFF
>>>>
>>>> --- untitled
>>>> +++ (clipboard)
>>>> @@ -1,51 +1,49 @@
>>>> -<str name="L12411p">
>>>> +<str name="L12411">
>>>>
>>>> -2.286596 = (MATCH) sum of:
>>>> -  1.6891675 = (MATCH) sum of:
>>>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>>>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>>>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>>>> -          0.1 = boost
>>>> +1.0651637 = (MATCH) sum of:
>>>> +  0.7871359 = (MATCH) sum of:
>>>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>>>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>>>> +        0.05489459 = queryWeight(text:dubai), product of:
>>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>>> -          0.021368012 = queryNorm
>>>> +          0.009944122 = queryNorm
>>>>         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>>>           1.4142135 = tf(termFreq(text:dubai)=2)
>>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>>>           0.25 = fieldNorm(field=text, doc=1551)
>>>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>>>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>>>           2.0 = boost
>>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>> -          0.021368012 = queryNorm
>>>> +          0.009944122 = queryNorm
>>>>         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>>>           1.4142135 = tf(termFreq(profile:dubai)=2)
>>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>>>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>>>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>>>> -          0.1 = boost
>>>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>>>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>>>> +        0.018402064 = queryWeight(text:product), product of:
>>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>> -          0.021368012 = queryNorm
>>>> +          0.009944122 = queryNorm
>>>>         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>>>           1.0 = tf(termFreq(text:product)=1)
>>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>>           0.25 = fieldNorm(field=text, doc=1551)
>>>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>>>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>>>           2.0 = boost
>>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>>> -          0.021368012 = queryNorm
>>>> +          0.009944122 = queryNorm
>>>>         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>>>           1.4142135 = tf(termFreq(profile:product)=2)
>>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>>>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>>>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>>         0.5 = boost
>>>>         11.667155 = idf(profile: dubai=7 product=290)
>>>> -        0.021368012 = queryNorm
>>>> +        0.009944122 = queryNorm
>>>>       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>>>         1.0954452 = tf(phraseFreq=1.2)
>>>>         11.667155 = idf(profile: dubai=7 product=290)
>>>>
>>>>
>>>>
>>>>
>>
>>
>

Re: Same index is ranking differently on 2 machines

Posted by Jayendra Patil <ja...@gmail.com>.
Are you sure you have the same config ...
The boost seems different for the field text - text:dubai^0.1 & text:dubai

-2.286596 = (MATCH) sum of:
-  1.6891675 = (MATCH) sum of:
-    1.3198489 = (MATCH) max plus 0.01 times others of:
-      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
-        0.011795795 = queryWeight(text:dubai^0.1), product of:
-          0.1 = boost
+1.0651637 = (MATCH) sum of:
+  0.7871359 = (MATCH) sum of:
+    0.6151879 = (MATCH) max plus 0.01 times others of:
+      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
+        0.05489459 = queryWeight(text:dubai), product of:

Regards,
Jayendra

On Wed, Mar 9, 2011 at 4:38 PM, Allistair Crossley <al...@roxxor.co.uk> wrote:
> Thanks. Good to know, but even so my problem remains - the end score should not be different and is causing a dramatically different ranking of a document (3 versus 7 is dramatic for my client). This must be down to the scoring debug differences - it's the only difference I can find :(
>
> On Mar 9, 2011, at 4:34 PM, Jayendra Patil wrote:
>
>> queryNorm is just a normalizing factor and is the same value across
>> all the results for a query, to just make the scores comparable.
>> So even if it varies in different environment, you should not worried about.
>>
>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
>> -
>> Defination - queryNorm(q) is just a normalizing factor used to make
>> scores between queries comparable. This factor does not affect
>> document ranking (since all ranked documents are multiplied by the
>> same factor), but rather just attempts to make scores from different
>> queries (or even different indexes) comparable
>>
>> Regards,
>> Jayendra
>>
>> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley <al...@roxxor.co.uk> wrote:
>>> Hi,
>>>
>>> I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.
>>>
>>> I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.
>>>
>>> The index is populated exclusively via data import handler queries to a database.
>>>
>>> I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.
>>>
>>> I execute a total full-import on both.
>>>
>>> Still, I see a different position for this document that should surely rank in the same location, all else being equal.
>>>
>>> I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.
>>>
>>> As far as I can tell every single query normalisation block of the debug is marginally different, e.g.
>>>
>>> -        0.021368012 = queryNorm (local)
>>> +        0.009944122 = queryNorm (production)
>>>
>>> Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).
>>>
>>> - -2.286596 (local)
>>> +1.0651637 = (production)
>>>
>>> I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?
>>>
>>> Thank you for your time
>>>
>>> Allistair
>>>
>>> ----- snip
>>>
>>> APPENDIX - debugQuery=on DIFF
>>>
>>> --- untitled
>>> +++ (clipboard)
>>> @@ -1,51 +1,49 @@
>>> -<str name="L12411p">
>>> +<str name="L12411">
>>>
>>> -2.286596 = (MATCH) sum of:
>>> -  1.6891675 = (MATCH) sum of:
>>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>>> -          0.1 = boost
>>> +1.0651637 = (MATCH) sum of:
>>> +  0.7871359 = (MATCH) sum of:
>>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>>> +        0.05489459 = queryWeight(text:dubai), product of:
>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>>           1.4142135 = tf(termFreq(text:dubai)=2)
>>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>>           0.25 = fieldNorm(field=text, doc=1551)
>>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>>           2.0 = boost
>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>>           1.4142135 = tf(termFreq(profile:dubai)=2)
>>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>>> -          0.1 = boost
>>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>>> +        0.018402064 = queryWeight(text:product), product of:
>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>>           1.0 = tf(termFreq(text:product)=1)
>>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>>           0.25 = fieldNorm(field=text, doc=1551)
>>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>>           2.0 = boost
>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>> -          0.021368012 = queryNorm
>>> +          0.009944122 = queryNorm
>>>         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>>           1.4142135 = tf(termFreq(profile:product)=2)
>>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>>           0.375 = fieldNorm(field=profile, doc=1551)
>>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>>         0.5 = boost
>>>         11.667155 = idf(profile: dubai=7 product=290)
>>> -        0.021368012 = queryNorm
>>> +        0.009944122 = queryNorm
>>>       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>>         1.0954452 = tf(phraseFreq=1.2)
>>>         11.667155 = idf(profile: dubai=7 product=290)
>>>
>>>
>>>
>>>
>
>

Re: Same index is ranking differently on 2 machines

Posted by Allistair Crossley <al...@roxxor.co.uk>.
Thanks. Good to know, but even so my problem remains - the end score should not be different and is causing a dramatically different ranking of a document (3 versus 7 is dramatic for my client). This must be down to the scoring debug differences - it's the only difference I can find :(

On Mar 9, 2011, at 4:34 PM, Jayendra Patil wrote:

> queryNorm is just a normalizing factor and is the same value across
> all the results for a query, to just make the scores comparable.
> So even if it varies in different environment, you should not worried about.
> 
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
> -
> Defination - queryNorm(q) is just a normalizing factor used to make
> scores between queries comparable. This factor does not affect
> document ranking (since all ranked documents are multiplied by the
> same factor), but rather just attempts to make scores from different
> queries (or even different indexes) comparable
> 
> Regards,
> Jayendra
> 
> On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley <al...@roxxor.co.uk> wrote:
>> Hi,
>> 
>> I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.
>> 
>> I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.
>> 
>> The index is populated exclusively via data import handler queries to a database.
>> 
>> I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.
>> 
>> I execute a total full-import on both.
>> 
>> Still, I see a different position for this document that should surely rank in the same location, all else being equal.
>> 
>> I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.
>> 
>> As far as I can tell every single query normalisation block of the debug is marginally different, e.g.
>> 
>> -        0.021368012 = queryNorm (local)
>> +        0.009944122 = queryNorm (production)
>> 
>> Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).
>> 
>> - -2.286596 (local)
>> +1.0651637 = (production)
>> 
>> I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?
>> 
>> Thank you for your time
>> 
>> Allistair
>> 
>> ----- snip
>> 
>> APPENDIX - debugQuery=on DIFF
>> 
>> --- untitled
>> +++ (clipboard)
>> @@ -1,51 +1,49 @@
>> -<str name="L12411p">
>> +<str name="L12411">
>> 
>> -2.286596 = (MATCH) sum of:
>> -  1.6891675 = (MATCH) sum of:
>> -    1.3198489 = (MATCH) max plus 0.01 times others of:
>> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
>> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
>> -          0.1 = boost
>> +1.0651637 = (MATCH) sum of:
>> +  0.7871359 = (MATCH) sum of:
>> +    0.6151879 = (MATCH) max plus 0.01 times others of:
>> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
>> +        0.05489459 = queryWeight(text:dubai), product of:
>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>>           1.4142135 = tf(termFreq(text:dubai)=2)
>>           5.520305 = idf(docFreq=65, maxDocs=6063)
>>           0.25 = fieldNorm(field=text, doc=1551)
>> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
>> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
>> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>>           2.0 = boost
>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>>           1.4142135 = tf(termFreq(profile:dubai)=2)
>>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>>           0.375 = fieldNorm(field=profile, doc=1551)
>> -    0.36931866 = (MATCH) max plus 0.01 times others of:
>> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
>> -        0.003954251 = queryWeight(text:product^0.1), product of:
>> -          0.1 = boost
>> +    0.17194802 = (MATCH) max plus 0.01 times others of:
>> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
>> +        0.018402064 = queryWeight(text:product), product of:
>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>>           1.0 = tf(termFreq(text:product)=1)
>>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>>           0.25 = fieldNorm(field=text, doc=1551)
>> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>> -        0.1725098 = queryWeight(profile:product^2.0), product of:
>> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
>> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>>           2.0 = boost
>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>> -          0.021368012 = queryNorm
>> +          0.009944122 = queryNorm
>>         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>>           1.4142135 = tf(termFreq(profile:product)=2)
>>           4.036637 = idf(docFreq=290, maxDocs=6063)
>>           0.375 = fieldNorm(field=profile, doc=1551)
>> -  0.59742856 = (MATCH) max plus 0.01 times others of:
>> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
>> +  0.27802786 = (MATCH) max plus 0.01 times others of:
>> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
>> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>>         0.5 = boost
>>         11.667155 = idf(profile: dubai=7 product=290)
>> -        0.021368012 = queryNorm
>> +        0.009944122 = queryNorm
>>       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>>         1.0954452 = tf(phraseFreq=1.2)
>>         11.667155 = idf(profile: dubai=7 product=290)
>> 
>> 
>> 
>> 


Re: Same index is ranking differently on 2 machines

Posted by Jayendra Patil <ja...@gmail.com>.
queryNorm is just a normalizing factor and is the same value across
all the results for a query, to just make the scores comparable.
So even if it varies in different environment, you should not worried about.

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm
-
Defination - queryNorm(q) is just a normalizing factor used to make
scores between queries comparable. This factor does not affect
document ranking (since all ranked documents are multiplied by the
same factor), but rather just attempts to make scores from different
queries (or even different indexes) comparable

Regards,
Jayendra

On Wed, Mar 9, 2011 at 4:22 PM, Allistair Crossley <al...@roxxor.co.uk> wrote:
> Hi,
>
> I am seeing an issue I do not understand and hope that someone can shed some light on this. The issue is that for a particular search we are seeing a particular result rank in position 3 on one machine and position 8 on the production machine. The position 3 is our desired and roughly expected ranking.
>
> I have a local machine with solr and a version deployed on a production server. My local machine's solr and the production version are both checked out from our project's SVN trunk. They are identical files except for the data files (not in SVN) and database connection settings.
>
> The index is populated exclusively via data import handler queries to a database.
>
> I have exported the production database as-is to my local development machine so that my local machine and production have access to the self same data.
>
> I execute a total full-import on both.
>
> Still, I see a different position for this document that should surely rank in the same location, all else being equal.
>
> I ran debugQuery diff to see how the scores were being computed. See appendix at foot of this email.
>
> As far as I can tell every single query normalisation block of the debug is marginally different, e.g.
>
> -        0.021368012 = queryNorm (local)
> +        0.009944122 = queryNorm (production)
>
> Which leads to a final score of -2 versus +1 which is enough to skew the results from correct to incorrect (in terms of what we expect to see).
>
> - -2.286596 (local)
> +1.0651637 = (production)
>
> I cannot explain this difference. The database is the same. The configuration is the same. I have fully imported from scratch on both servers. What am I missing?
>
> Thank you for your time
>
> Allistair
>
> ----- snip
>
> APPENDIX - debugQuery=on DIFF
>
> --- untitled
> +++ (clipboard)
> @@ -1,51 +1,49 @@
> -<str name="L12411p">
> +<str name="L12411">
>
> -2.286596 = (MATCH) sum of:
> -  1.6891675 = (MATCH) sum of:
> -    1.3198489 = (MATCH) max plus 0.01 times others of:
> -      0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551), product of:
> -        0.011795795 = queryWeight(text:dubai^0.1), product of:
> -          0.1 = boost
> +1.0651637 = (MATCH) sum of:
> +  0.7871359 = (MATCH) sum of:
> +    0.6151879 = (MATCH) max plus 0.01 times others of:
> +      0.10713901 = (MATCH) weight(text:dubai in 1551), product of:
> +        0.05489459 = queryWeight(text:dubai), product of:
>           5.520305 = idf(docFreq=65, maxDocs=6063)
> -          0.021368012 = queryNorm
> +          0.009944122 = queryNorm
>         1.9517226 = (MATCH) fieldWeight(text:dubai in 1551), product of:
>           1.4142135 = tf(termFreq(text:dubai)=2)
>           5.520305 = idf(docFreq=65, maxDocs=6063)
>           0.25 = fieldNorm(field=text, doc=1551)
> -      1.3196187 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
> -        0.32609802 = queryWeight(profile:dubai^2.0), product of:
> +      0.6141165 = (MATCH) weight(profile:dubai^2.0 in 1551), product of:
> +        0.15175761 = queryWeight(profile:dubai^2.0), product of:
>           2.0 = boost
>           7.6305184 = idf(docFreq=7, maxDocs=6063)
> -          0.021368012 = queryNorm
> +          0.009944122 = queryNorm
>         4.0466933 = (MATCH) fieldWeight(profile:dubai in 1551), product of:
>           1.4142135 = tf(termFreq(profile:dubai)=2)
>           7.6305184 = idf(docFreq=7, maxDocs=6063)
>           0.375 = fieldNorm(field=profile, doc=1551)
> -    0.36931866 = (MATCH) max plus 0.01 times others of:
> -      0.0018293816 = (MATCH) weight(text:product^0.1 in 1551), product of:
> -        0.003954251 = queryWeight(text:product^0.1), product of:
> -          0.1 = boost
> +    0.17194802 = (MATCH) max plus 0.01 times others of:
> +      0.00851347 = (MATCH) weight(text:product in 1551), product of:
> +        0.018402064 = queryWeight(text:product), product of:
>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
> -          0.021368012 = queryNorm
> +          0.009944122 = queryNorm
>         0.4626367 = (MATCH) fieldWeight(text:product in 1551), product of:
>           1.0 = tf(termFreq(text:product)=1)
>           1.8505468 = idf(docFreq=2589, maxDocs=6063)
>           0.25 = fieldNorm(field=text, doc=1551)
> -      0.36930037 = (MATCH) weight(profile:product^2.0 in 1551), product of:
> -        0.1725098 = queryWeight(profile:product^2.0), product of:
> +      0.17186289 = (MATCH) weight(profile:product^2.0 in 1551), product of:
> +        0.08028162 = queryWeight(profile:product^2.0), product of:
>           2.0 = boost
>           4.036637 = idf(docFreq=290, maxDocs=6063)
> -          0.021368012 = queryNorm
> +          0.009944122 = queryNorm
>         2.14075 = (MATCH) fieldWeight(profile:product in 1551), product of:
>           1.4142135 = tf(termFreq(profile:product)=2)
>           4.036637 = idf(docFreq=290, maxDocs=6063)
>           0.375 = fieldNorm(field=profile, doc=1551)
> -  0.59742856 = (MATCH) max plus 0.01 times others of:
> -    0.59742856 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
> -      0.12465195 = queryWeight(profile:"dubai product"~10^0.5), product of:
> +  0.27802786 = (MATCH) max plus 0.01 times others of:
> +    0.27802786 = weight(profile:"dubai product"~10^0.5 in 1551), product of:
> +      0.05800981 = queryWeight(profile:"dubai product"~10^0.5), product of:
>         0.5 = boost
>         11.667155 = idf(profile: dubai=7 product=290)
> -        0.021368012 = queryNorm
> +        0.009944122 = queryNorm
>       4.7927732 = fieldWeight(profile:"dubai product" in 1551), product of:
>         1.0954452 = tf(phraseFreq=1.2)
>         11.667155 = idf(profile: dubai=7 product=290)
>
>
>
>