You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Philippe <ma...@gmail.com> on 2010/07/21 23:25:55 UTC

Different ranking results

Hi,

I just performed two queries which, in my opinion, should lead to the 
same document rankings. However, the document ranking differ between 
these two queries. For better understanding I prepared  minimal examples 
for both queries. In my understanding both queries perform the same 
task. Namely search for "lucene" in two different fields.

Maybe someone can explain me my misunderstanding?


String[] fields = {"TITLE", "BOOK"};
MultiFieldQueryParser parser = new 
MultiFieldQueryParser(Version.LUCENE_29, fields, new 
StandardAnalyzer(Version.LUCENE_29));

1.)
Query q = parser.parse("lucene");

2.)
Query q = parser.parse(TITLE:lucene OR BOOK:lucene);

Regards,
     philippe

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Different ranking results

Posted by Philippe <ma...@gmail.com>.

Well,

that's difficult at the moment as I can also just reproduce this error 
for some few cases. But I will try to generate such an example..

Cheers,
     Philippe

Am 22.07.2010 12:34, schrieb Ian Lea:
> No, I don't have an explanation.  Perhaps a minimal self-contained
> program or test case would help.
>
>
> --
> Ian.
>
>
> On Thu, Jul 22, 2010 at 10:23 AM, Philippe<ma...@gmail.com>  wrote:
>    
>> Hi Ian,
>>
>> I'm using Version 2.93 of lucene.
>>
>> q.getClass() and q.toString() are exactly equal:
>> org.apache.lucene.search.BooleanQuery
>> TITLE:672 BOOK:672
>>
>> However, the results for searcher.explain(q,n) significantly differ. It
>> seems to me that  "Query q = parser.parse("672");" searches only one the
>> Book field, whereas  "Query q = parser.parse("TITLE:672 BOOK:672");"
>> searches on both fields. Do you have a explanation for this behaviour? I
>> only observed this problem for this field...
>>
>> I appended the result of both explain strings below:
>>
>> "Query q = parser.parse("672");"
>> 9.987344E10 = (MATCH) sum of:
>>   9.987344E10 = (MATCH) weight(BOOK:672 in 7078), product of:
>>     0.6583703 = queryWeight(BOOK:672), product of:
>>       6.085349 = idf(docFreq=109, maxDocs=17780)
>>       0.10818941 = queryNorm
>>     1.51697965E11 = (MATCH) fieldWeight(BOOK:672 in 7078), product of:
>>       3.3166249 = tf(termFreq(BOOK:672)=11)
>>       6.085349 = idf(docFreq=109, maxDocs=17780)
>>       7.5161928E9 = fieldNorm(field=BOOK, doc=7078)
>>
>>
>> "Query q = parser.parse("TITLE:672 BOOK:672");"
>> 9.5225594E10 = (MATCH) sum of:
>>   9.5225594E10 = (MATCH) weight(BOOK:672 in 4979), product of:
>>     0.6583703 = queryWeight(BOOK:672), product of:
>>       6.085349 = idf(docFreq=109, maxDocs=17780)
>>       0.10818941 = queryNorm
>>     1.44638345E11 = (MATCH) fieldWeight(BOOK:672 in 4979), product of:
>>       3.1622777 = tf(termFreq(BOOK:672)=10)
>>       6.085349 = idf(docFreq=109, maxDocs=17780)
>>       7.5161928E9 = fieldNorm(field=BOOK, doc=4979)
>>   52.366344 = (MATCH) weight(TITLE:672 in 4979), product of:
>>     0.7526941 = queryWeight(TITLE:672), product of:
>>       6.957188 = idf(docFreq=45, maxDocs=17780)
>>       0.10818941 = queryNorm
>>     69.571884 = (MATCH) fieldWeight(TITLE:672 in 4979), product of:
>>       1.0 = tf(termFreq(TITLE:672)=1)
>>       6.957188 = idf(docFreq=45, maxDocs=17780)
>>       10.0 = fieldNorm(field=TITLE, doc=4979)
>>
>> Cheers,
>>     P.
>> Am 22.07.2010 10:02, schrieb Ian Lea:
>>      
>>> They look the same to me too.
>>>
>>> What does q.getClass().getName() say in each case? q.toString()?
>>> searcher.explain(q, n)?
>>>
>>> What version of lucene?
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>>
>>>
>>> On Wed, Jul 21, 2010 at 10:25 PM, Philippe<ma...@gmail.com>
>>>   wrote:
>>>
>>>        
>>>> Hi,
>>>>
>>>> I just performed two queries which, in my opinion, should lead to the
>>>> same
>>>> document rankings. However, the document ranking differ between these two
>>>> queries. For better understanding I prepared  minimal examples for both
>>>> queries. In my understanding both queries perform the same task. Namely
>>>> search for "lucene" in two different fields.
>>>>
>>>> Maybe someone can explain me my misunderstanding?
>>>>
>>>>
>>>> String[] fields = {"TITLE", "BOOK"};
>>>> MultiFieldQueryParser parser = new
>>>> MultiFieldQueryParser(Version.LUCENE_29,
>>>> fields, new StandardAnalyzer(Version.LUCENE_29));
>>>>
>>>> 1.)
>>>> Query q = parser.parse("lucene");
>>>>
>>>> 2.)
>>>> Query q = parser.parse(TITLE:lucene OR BOOK:lucene);
>>>>
>>>> Regards,
>>>>     philippe
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>          
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>        
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>      
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Different ranking results

Posted by Ian Lea <ia...@gmail.com>.

No, I don't have an explanation.  Perhaps a minimal self-contained
program or test case would help.


--
Ian.


On Thu, Jul 22, 2010 at 10:23 AM, Philippe <ma...@gmail.com> wrote:
> Hi Ian,
>
> I'm using Version 2.93 of lucene.
>
> q.getClass() and q.toString() are exactly equal:
> org.apache.lucene.search.BooleanQuery
> TITLE:672 BOOK:672
>
> However, the results for searcher.explain(q,n) significantly differ. It
> seems to me that  "Query q = parser.parse("672");" searches only one the
> Book field, whereas  "Query q = parser.parse("TITLE:672 BOOK:672");"
> searches on both fields. Do you have a explanation for this behaviour? I
> only observed this problem for this field...
>
> I appended the result of both explain strings below:
>
> "Query q = parser.parse("672");"
> 9.987344E10 = (MATCH) sum of:
>  9.987344E10 = (MATCH) weight(BOOK:672 in 7078), product of:
>    0.6583703 = queryWeight(BOOK:672), product of:
>      6.085349 = idf(docFreq=109, maxDocs=17780)
>      0.10818941 = queryNorm
>    1.51697965E11 = (MATCH) fieldWeight(BOOK:672 in 7078), product of:
>      3.3166249 = tf(termFreq(BOOK:672)=11)
>      6.085349 = idf(docFreq=109, maxDocs=17780)
>      7.5161928E9 = fieldNorm(field=BOOK, doc=7078)
>
>
> "Query q = parser.parse("TITLE:672 BOOK:672");"
> 9.5225594E10 = (MATCH) sum of:
>  9.5225594E10 = (MATCH) weight(BOOK:672 in 4979), product of:
>    0.6583703 = queryWeight(BOOK:672), product of:
>      6.085349 = idf(docFreq=109, maxDocs=17780)
>      0.10818941 = queryNorm
>    1.44638345E11 = (MATCH) fieldWeight(BOOK:672 in 4979), product of:
>      3.1622777 = tf(termFreq(BOOK:672)=10)
>      6.085349 = idf(docFreq=109, maxDocs=17780)
>      7.5161928E9 = fieldNorm(field=BOOK, doc=4979)
>  52.366344 = (MATCH) weight(TITLE:672 in 4979), product of:
>    0.7526941 = queryWeight(TITLE:672), product of:
>      6.957188 = idf(docFreq=45, maxDocs=17780)
>      0.10818941 = queryNorm
>    69.571884 = (MATCH) fieldWeight(TITLE:672 in 4979), product of:
>      1.0 = tf(termFreq(TITLE:672)=1)
>      6.957188 = idf(docFreq=45, maxDocs=17780)
>      10.0 = fieldNorm(field=TITLE, doc=4979)
>
> Cheers,
>    P.
> Am 22.07.2010 10:02, schrieb Ian Lea:
>>
>> They look the same to me too.
>>
>> What does q.getClass().getName() say in each case? q.toString()?
>> searcher.explain(q, n)?
>>
>> What version of lucene?
>>
>>
>> --
>> Ian.
>>
>>
>>
>>
>> On Wed, Jul 21, 2010 at 10:25 PM, Philippe<ma...@gmail.com>
>>  wrote:
>>
>>>
>>> Hi,
>>>
>>> I just performed two queries which, in my opinion, should lead to the
>>> same
>>> document rankings. However, the document ranking differ between these two
>>> queries. For better understanding I prepared  minimal examples for both
>>> queries. In my understanding both queries perform the same task. Namely
>>> search for "lucene" in two different fields.
>>>
>>> Maybe someone can explain me my misunderstanding?
>>>
>>>
>>> String[] fields = {"TITLE", "BOOK"};
>>> MultiFieldQueryParser parser = new
>>> MultiFieldQueryParser(Version.LUCENE_29,
>>> fields, new StandardAnalyzer(Version.LUCENE_29));
>>>
>>> 1.)
>>> Query q = parser.parse("lucene");
>>>
>>> 2.)
>>> Query q = parser.parse(TITLE:lucene OR BOOK:lucene);
>>>
>>> Regards,
>>>    philippe
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Different ranking results

Posted by Philippe <ma...@gmail.com>.

Hi Ian,

I'm using Version 2.93 of lucene.

q.getClass() and q.toString() are exactly equal:
org.apache.lucene.search.BooleanQuery
TITLE:672 BOOK:672

However, the results for searcher.explain(q,n) significantly differ. It 
seems to me that  "Query q = parser.parse("672");" searches only one the 
Book field, whereas  "Query q = parser.parse("TITLE:672 BOOK:672");" 
searches on both fields. Do you have a explanation for this behaviour? I 
only observed this problem for this field...

I appended the result of both explain strings below:

"Query q = parser.parse("672");"
9.987344E10 = (MATCH) sum of:
   9.987344E10 = (MATCH) weight(BOOK:672 in 7078), product of:
     0.6583703 = queryWeight(BOOK:672), product of:
       6.085349 = idf(docFreq=109, maxDocs=17780)
       0.10818941 = queryNorm
     1.51697965E11 = (MATCH) fieldWeight(BOOK:672 in 7078), product of:
       3.3166249 = tf(termFreq(BOOK:672)=11)
       6.085349 = idf(docFreq=109, maxDocs=17780)
       7.5161928E9 = fieldNorm(field=BOOK, doc=7078)


"Query q = parser.parse("TITLE:672 BOOK:672");"
9.5225594E10 = (MATCH) sum of:
   9.5225594E10 = (MATCH) weight(BOOK:672 in 4979), product of:
     0.6583703 = queryWeight(BOOK:672), product of:
       6.085349 = idf(docFreq=109, maxDocs=17780)
       0.10818941 = queryNorm
     1.44638345E11 = (MATCH) fieldWeight(BOOK:672 in 4979), product of:
       3.1622777 = tf(termFreq(BOOK:672)=10)
       6.085349 = idf(docFreq=109, maxDocs=17780)
       7.5161928E9 = fieldNorm(field=BOOK, doc=4979)
   52.366344 = (MATCH) weight(TITLE:672 in 4979), product of:
     0.7526941 = queryWeight(TITLE:672), product of:
       6.957188 = idf(docFreq=45, maxDocs=17780)
       0.10818941 = queryNorm
     69.571884 = (MATCH) fieldWeight(TITLE:672 in 4979), product of:
       1.0 = tf(termFreq(TITLE:672)=1)
       6.957188 = idf(docFreq=45, maxDocs=17780)
       10.0 = fieldNorm(field=TITLE, doc=4979)

Cheers,
     P.
Am 22.07.2010 10:02, schrieb Ian Lea:
> They look the same to me too.
>
> What does q.getClass().getName() say in each case? q.toString()?
> searcher.explain(q, n)?
>
> What version of lucene?
>
>
> --
> Ian.
>
>
>
>
> On Wed, Jul 21, 2010 at 10:25 PM, Philippe<ma...@gmail.com>  wrote:
>    
>> Hi,
>>
>> I just performed two queries which, in my opinion, should lead to the same
>> document rankings. However, the document ranking differ between these two
>> queries. For better understanding I prepared  minimal examples for both
>> queries. In my understanding both queries perform the same task. Namely
>> search for "lucene" in two different fields.
>>
>> Maybe someone can explain me my misunderstanding?
>>
>>
>> String[] fields = {"TITLE", "BOOK"};
>> MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_29,
>> fields, new StandardAnalyzer(Version.LUCENE_29));
>>
>> 1.)
>> Query q = parser.parse("lucene");
>>
>> 2.)
>> Query q = parser.parse(TITLE:lucene OR BOOK:lucene);
>>
>> Regards,
>>     philippe
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>      
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Different ranking results

Posted by Ian Lea <ia...@gmail.com>.

They look the same to me too.

What does q.getClass().getName() say in each case? q.toString()?
searcher.explain(q, n)?

What version of lucene?


--
Ian.




On Wed, Jul 21, 2010 at 10:25 PM, Philippe <ma...@gmail.com> wrote:
> Hi,
>
> I just performed two queries which, in my opinion, should lead to the same
> document rankings. However, the document ranking differ between these two
> queries. For better understanding I prepared  minimal examples for both
> queries. In my understanding both queries perform the same task. Namely
> search for "lucene" in two different fields.
>
> Maybe someone can explain me my misunderstanding?
>
>
> String[] fields = {"TITLE", "BOOK"};
> MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_29,
> fields, new StandardAnalyzer(Version.LUCENE_29));
>
> 1.)
> Query q = parser.parse("lucene");
>
> 2.)
> Query q = parser.parse(TITLE:lucene OR BOOK:lucene);
>
> Regards,
>    philippe
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Different ranking results

Posted by Grant Ingersoll <gs...@apache.org>.

Can you post a full example as a Unit test?

On Jul 21, 2010, at 5:25 PM, Philippe wrote:

> Hi,
> 
> I just performed two queries which, in my opinion, should lead to the same document rankings. However, the document ranking differ between these two queries. For better understanding I prepared  minimal examples for both queries. In my understanding both queries perform the same task. Namely search for "lucene" in two different fields.
> 
> Maybe someone can explain me my misunderstanding?
> 
> 
> String[] fields = {"TITLE", "BOOK"};
> MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_29, fields, new StandardAnalyzer(Version.LUCENE_29));
> 
> 1.)
> Query q = parser.parse("lucene");
> 
> 2.)
> Query q = parser.parse(TITLE:lucene OR BOOK:lucene);
> 
> Regards,
>    philippe
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Holding and changing index wide information

Posted by findbestopensource <fi...@gmail.com>.

 Hi Jan,

I think, you require version number for each commit OR updates. Say
you added 10 docs then it is update 1, then modifed or added some more
then it is update 2.. If it is so then my advice would be to have
field named field-type, version-number and version-date-time as part
of the field in the index. You could set this field as like any other
field. Retrieve the record by filtering the value to the field
field-type.

Regards
Aditya
www.findbestopensource.com

On Thu, Jul 22, 2010 at 5:25 PM,  <ja...@nokia.com> wrote:
> Hi,
>
> When using incremental updating via Solr, we want to know, which update is in the current index. Each update has a number.
> How can we store/change/retrieve this number with the index. We want to store it in the index to replicate it to any slaves as well.
>
> So basically can I store/change/retrieve a number index wide in lucen/Solr?
>
> Jan
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Holding and changing index wide information

Posted by Ian Lea <ia...@gmail.com>.

Just add/update a dedicated document in the index.

k=updatenumber
v=whatever.

Retrieve it with a search for k:updatenumber, update with
iw.updateDocument(whatever).


--
Ian.


On Thu, Jul 22, 2010 at 12:55 PM,  <ja...@nokia.com> wrote:
> Hi,
>
> When using incremental updating via Solr, we want to know, which update is in the current index. Each update has a number.
> How can we store/change/retrieve this number with the index. We want to store it in the index to replicate it to any slaves as well.
>
> So basically can I store/change/retrieve a number index wide in lucen/Solr?
>
> Jan
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Holding and changing index wide information

Posted by ja...@nokia.com.

Hi,

When using incremental updating via Solr, we want to know, which update is in the current index. Each update has a number.
How can we store/change/retrieve this number with the index. We want to store it in the index to replicate it to any slaves as well.

So basically can I store/change/retrieve a number index wide in lucen/Solr?

Jan