You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael van Rooyen <mi...@loot.co.za> on 2013/08/20 17:36:10 UTC
Document boosting and native ordering of results
Hello. We've just upgraded to 4.3.1 from 2.9.2 and are having a problem
with native ordering of search results.
We always want documents returned in order of "rank", which for us is a
float value that we assign to each document at index time. Rank depends
in whether, for example, the item is in stock and how recent it is. We
also store the rank as a field in the index. We don't use Lucene's
scoring system for ordering results at all.
In 2.9.2, we used to set the boost on the document (we encoded our rank
to ensure nice distribution over float range that is ultimately encoded
as a 1 byte norm), and all results were returned in rank order without
using a sort.
In 4.3.1, the document level boost is gone and only fields can be
boosted. Some queries, like a MatchAllDocsQuery, don't seem to take
field level boosts into account at all when ordering results.
Is there an easy way in Lucene 4 to set the natural order for results in
the absence of an explicit sort?
Thanks!
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Document boosting and native ordering of results
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
I had been planning something similar to what Michael was used to:
creating a regular numeric field (call it "weight", say) with a rank
value, applying a field boost to that field that is equal to the rank
value, and then querying with weight:[* TO *] as a term, thinking that
would end up bringing in the rank to the scoring calculation. Is that
not going to work? Is it necessary or better to use DocValues with a
FunctionQuery?
Thanks
Mike
On 8/26/13 1:37 PM, Uwe Schindler wrote:
> Hi,
>
> This is still possible (in reality it was broken in Lucene version prior 4.0 if you refer to Document.setBoost() -> see changelog/MIGRATE.txt): You have to add an additional DocValues field (a long or double numeric) and use a FunctionQuery / CustomScoreQuery to modify the score based on this value.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael van Rooyen [mailto:michael@loot.co.za]
>> Sent: Monday, August 26, 2013 6:39 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Document boosting and native ordering of results
>>
>> Not sure if there are any thoughts on this.
>>
>> It definitely makes sense to assign a rank to each document in the index, so
>> that all else being equal, documents are returned in order of rank. This is
>> exactly what the page rank is in Google's index, and Google would be lost
>> without it. This used to be possible in old versions of Lucene, but no longer.
>> Should this be posted as a feature request to the developers?
>>
>> Thanks,
>> Michael.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Document boosting and native ordering of results
Posted by Michael van Rooyen <mi...@loot.co.za>.
Thanks Uwe! I hadn't investigated DocValues fields, but they look like
an exciting addition to Lucene and definitely what we need. The
FunctionQuery / CustomScoreQuery would be a great solution, but there
doesn't seem to be a ValueSource dedicated to DocValues fields and all
the field-based value-sources I could find are based on access via the
field cache. One of the purposes of the DocValues fields (in my
understanding) is to bypass the need for using the field cache. Am I
missing something?
On 2013/08/26 07:37 PM, Uwe Schindler wrote:
> Hi,
>
> This is still possible (in reality it was broken in Lucene version prior 4.0 if you refer to Document.setBoost() -> see changelog/MIGRATE.txt): You have to add an additional DocValues field (a long or double numeric) and use a FunctionQuery / CustomScoreQuery to modify the score based on this value.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael van Rooyen [mailto:michael@loot.co.za]
>> Sent: Monday, August 26, 2013 6:39 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Document boosting and native ordering of results
>>
>> Not sure if there are any thoughts on this.
>>
>> It definitely makes sense to assign a rank to each document in the index, so
>> that all else being equal, documents are returned in order of rank. This is
>> exactly what the page rank is in Google's index, and Google would be lost
>> without it. This used to be possible in old versions of Lucene, but no longer.
>> Should this be posted as a feature request to the developers?
>>
>> Thanks,
>> Michael.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Document boosting and native ordering of results
Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,
This is still possible (in reality it was broken in Lucene version prior 4.0 if you refer to Document.setBoost() -> see changelog/MIGRATE.txt): You have to add an additional DocValues field (a long or double numeric) and use a FunctionQuery / CustomScoreQuery to modify the score based on this value.
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: Michael van Rooyen [mailto:michael@loot.co.za]
> Sent: Monday, August 26, 2013 6:39 PM
> To: java-user@lucene.apache.org
> Subject: Re: Document boosting and native ordering of results
>
> Not sure if there are any thoughts on this.
>
> It definitely makes sense to assign a rank to each document in the index, so
> that all else being equal, documents are returned in order of rank. This is
> exactly what the page rank is in Google's index, and Google would be lost
> without it. This used to be possible in old versions of Lucene, but no longer.
> Should this be posted as a feature request to the developers?
>
> Thanks,
> Michael.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Document boosting and native ordering of results
Posted by Michael van Rooyen <mi...@loot.co.za>.
Not sure if there are any thoughts on this.
It definitely makes sense to assign a rank to each document in the
index, so that all else being equal, documents are returned in order of
rank. This is exactly what the page rank is in Google's index, and
Google would be lost without it. This used to be possible in old
versions of Lucene, but no longer. Should this be posted as a feature
request to the developers?
Thanks,
Michael.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Altering field info without building index from scratch
Posted by Michael van Rooyen <mi...@loot.co.za>.
Hello.
We got the error:
java.lang.IllegalStateException: field "xxx" was indexed without
position data; cannot run PhraseQuery
What I suspect is happening is that field xxx was first indexed as a
StringField (untokenized), and subsequently changed to TextField
(tokenized and analyzed). Even though all the docs containing the field
have been updated in the index, Lucene still sees this as a raw field.
Is there a way to change the meta data associated with a field without
building the index from scratch?
Thanks,
Michael.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org