You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Michael van Rooyen <mi...@loot.co.za> on 2013/08/20 17:36:10 UTC

Document boosting and native ordering of results

Hello.  We've just upgraded to 4.3.1 from 2.9.2 and are having a problem 
with native ordering of search results.

We always want documents returned in order of "rank", which for us is a 
float value that we assign to each document at index time. Rank depends 
in whether, for example, the item is in stock and how recent it is.  We 
also store the rank as a field in the index. We don't use Lucene's 
scoring system for ordering results at all.

In 2.9.2, we used to set the boost on the document (we encoded our rank 
to ensure nice distribution over float range that is ultimately encoded 
as a 1 byte norm), and all results were returned in rank order without 
using a sort.

In 4.3.1, the document level boost is gone and only fields can be 
boosted.  Some queries, like a MatchAllDocsQuery, don't seem to take 
field level boosts into account at all when ordering results.

Is there an easy way in Lucene 4 to set the natural order for results in 
the absence of an explicit sort?

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Document boosting and native ordering of results

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

I had been planning something similar to what Michael was used to: 
creating a regular numeric field (call it "weight", say) with a rank 
value, applying a field boost to that field that is equal to the rank 
value, and then querying with weight:[* TO *] as a term, thinking that 
would end up bringing in the rank to the scoring calculation.  Is that 
not going to work?  Is it necessary or better to use DocValues with a 
FunctionQuery?

Thanks

Mike

On 8/26/13 1:37 PM, Uwe Schindler wrote:
> Hi,
>
> This is still possible (in reality it was broken in Lucene version prior 4.0 if you refer to Document.setBoost() -> see changelog/MIGRATE.txt): You have to add an additional DocValues field (a long or double numeric) and use a FunctionQuery / CustomScoreQuery to modify the score based on this value.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael van Rooyen [mailto:michael@loot.co.za]
>> Sent: Monday, August 26, 2013 6:39 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Document boosting and native ordering of results
>>
>> Not sure if there are any thoughts on this.
>>
>> It definitely makes sense to assign a rank to each document in the index, so
>> that all else being equal, documents are returned in order of rank.  This is
>> exactly what the page rank is in Google's index, and Google would be lost
>> without it.  This used to be possible in old versions of Lucene, but no longer.
>> Should this be posted as a feature request to the developers?
>>
>> Thanks,
>> Michael.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Document boosting and native ordering of results

Posted by Michael van Rooyen <mi...@loot.co.za>.

Thanks Uwe!  I hadn't investigated DocValues fields, but they look like 
an exciting addition to Lucene and definitely what we need. The 
FunctionQuery / CustomScoreQuery would be a great solution, but there 
doesn't seem to be a ValueSource dedicated to DocValues fields and all 
the field-based value-sources I could find are based on access via the 
field cache.  One of the purposes of the DocValues fields (in my 
understanding) is to bypass the need for using the field cache.  Am I 
missing something?

On 2013/08/26 07:37 PM, Uwe Schindler wrote:
> Hi,
>
> This is still possible (in reality it was broken in Lucene version prior 4.0 if you refer to Document.setBoost() -> see changelog/MIGRATE.txt): You have to add an additional DocValues field (a long or double numeric) and use a FunctionQuery / CustomScoreQuery to modify the score based on this value.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael van Rooyen [mailto:michael@loot.co.za]
>> Sent: Monday, August 26, 2013 6:39 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Document boosting and native ordering of results
>>
>> Not sure if there are any thoughts on this.
>>
>> It definitely makes sense to assign a rank to each document in the index, so
>> that all else being equal, documents are returned in order of rank.  This is
>> exactly what the page rank is in Google's index, and Google would be lost
>> without it.  This used to be possible in old versions of Lucene, but no longer.
>> Should this be posted as a feature request to the developers?
>>
>> Thanks,
>> Michael.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Document boosting and native ordering of results

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,

This is still possible (in reality it was broken in Lucene version prior 4.0 if you refer to Document.setBoost() -> see changelog/MIGRATE.txt): You have to add an additional DocValues field (a long or double numeric) and use a FunctionQuery / CustomScoreQuery to modify the score based on this value.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael van Rooyen [mailto:michael@loot.co.za]
> Sent: Monday, August 26, 2013 6:39 PM
> To: java-user@lucene.apache.org
> Subject: Re: Document boosting and native ordering of results
> 
> Not sure if there are any thoughts on this.
> 
> It definitely makes sense to assign a rank to each document in the index, so
> that all else being equal, documents are returned in order of rank.  This is
> exactly what the page rank is in Google's index, and Google would be lost
> without it.  This used to be possible in old versions of Lucene, but no longer.
> Should this be posted as a feature request to the developers?
> 
> Thanks,
> Michael.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Document boosting and native ordering of results

Posted by Michael van Rooyen <mi...@loot.co.za>.

Not sure if there are any thoughts on this.

It definitely makes sense to assign a rank to each document in the 
index, so that all else being equal, documents are returned in order of 
rank.  This is exactly what the page rank is in Google's index, and 
Google would be lost without it.  This used to be possible in old 
versions of Lucene, but no longer.  Should this be posted as a feature 
request to the developers?

Thanks,
Michael.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Altering field info without building index from scratch

Posted by Michael van Rooyen <mi...@loot.co.za>.

Hello.

We got the error:

java.lang.IllegalStateException: field "xxx" was indexed without 
position data; cannot run PhraseQuery

What I suspect is happening is that field xxx was first indexed as a 
StringField (untokenized), and subsequently changed to TextField 
(tokenized and analyzed).  Even though all the docs containing the field 
have been updated in the index, Lucene still sees this as a raw field.

Is there a way to change the meta data associated with a field without 
building the index from scratch?

Thanks,
Michael.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org