You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nick Vincent <ni...@neoworks.com> on 2006/01/24 18:42:06 UTC

Sorting by calculated custom score at search time

Hi,

I am trying to find a way to create scores with a custom formula based
on the initial score from Lucene and field values from each document,
e.g. for each document:

 finalScore = searchScore * (popularity) * (userRating)

The customer requires this functionality as I have to replace an
existing system that works like this.  User rating and popularity are
already available and will be stored in Lucene.  I've looked through LIA
and the approaches there don't seem to fit the requirement:

5.1.6 Sorting by multiple fields: only sorts by one field, then the
next, I need to combine the scores
6.1 Using a custom sort method: does not take into account the
document's original score

>From an earlier thread discussing a calculated score based on the hit
score and the age of document I gather that TSS regenerate their indexes
to alter the document boost based on date.  I need to be able to sort by
either relevance or "popularity rated relevance" depending on user
input, so I don't think adding a precalculated document boost at index
time is an option.

In the worst case scenario I'll need to iterate through the hits and
then sort them in memory myself, but I'm looking to be indexing around
500,000 documents, and in this particular application there are a lot of
common keywords, so a large number of hits for a basic query is common.
I'm trying to avoid this as it's an untidy solution which is likely to
be (relatively) slow.

I notice Erik has commented that "I've not come across a really clean
way to do this sort of age-based  
boosting other than how TSS does it".  I was wondering if anyone has any
experience with dirtier approaches they could share with me?

Any help is really appreciated,

Thanks,

Nick
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting by calculated custom score at search time

Posted by Chris Hostetter <ho...@fucit.org>.
: It's not in subversion yet though ;-)
:
: You have to look here:
: http://issues.apache.org/jira/browse/LUCENE-446

Whoops ... sorry about that.  I forget how far out on the bleeding edge
the code I'm using is sometimes :)

It definitely works right now, so you should give it a shot -- but you may
not want to invest *too* heavily in integrating it into your code if the
API is still in flux.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting by calculated custom score at search time

Posted by Yonik Seeley <ys...@gmail.com>.
It's not in subversion yet though ;-)

You have to look here:
http://issues.apache.org/jira/browse/LUCENE-446

I haven't committed it, because we may be able to do better (maybe
removing the difference between Query and ValueSource so you could
freely mix the two and not have to wrap ValueSource in a
FunctionQuery).

-Yonik


On 1/24/06, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> Take a look at the org.apache.lucene.search.function package in SVN.  It
> provides an API that allows you to define "function" classes that can
> compute a score for each document using whatever means you want.  The
> overall FunctionQuery can then be wrapped in a BooleanQuery along with
> whateer other search critera you have.
>
> 5 basic functions have been included that can be composed in all sorts of
> interesting ways to compute scores based on document values for a
> particular field (or the relative ordinal positions in the FieldCache for
> that field)
>
>
> : Date: Tue, 24 Jan 2006 17:42:06 -0000
> : From: Nick Vincent <ni...@neoworks.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: Sorting by calculated custom score at search time
> :
> : Hi,
> :
> : I am trying to find a way to create scores with a custom formula based
> : on the initial score from Lucene and field values from each document,
> : e.g. for each document:
> :
> :  finalScore = searchScore * (popularity) * (userRating)
> :
> : The customer requires this functionality as I have to replace an
> : existing system that works like this.  User rating and popularity are
> : already available and will be stored in Lucene.  I've looked through LIA
> : and the approaches there don't seem to fit the requirement:
> :
> : 5.1.6 Sorting by multiple fields: only sorts by one field, then the
> : next, I need to combine the scores
> : 6.1 Using a custom sort method: does not take into account the
> : document's original score
> :
> : >From an earlier thread discussing a calculated score based on the hit
> : score and the age of document I gather that TSS regenerate their indexes
> : to alter the document boost based on date.  I need to be able to sort by
> : either relevance or "popularity rated relevance" depending on user
> : input, so I don't think adding a precalculated document boost at index
> : time is an option.
> :
> : In the worst case scenario I'll need to iterate through the hits and
> : then sort them in memory myself, but I'm looking to be indexing around
> : 500,000 documents, and in this particular application there are a lot of
> : common keywords, so a large number of hits for a basic query is common.
> : I'm trying to avoid this as it's an untidy solution which is likely to
> : be (relatively) slow.
> :
> : I notice Erik has commented that "I've not come across a really clean
> : way to do this sort of age-based
> : boosting other than how TSS does it".  I was wondering if anyone has any
> : experience with dirtier approaches they could share with me?
> :
> : Any help is really appreciated,
> :
> : Thanks,
> :
> : Nick
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : For additional commands, e-mail: java-user-help@lucene.apache.org
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting by calculated custom score at search time

Posted by Chris Hostetter <ho...@fucit.org>.

Take a look at the org.apache.lucene.search.function package in SVN.  It
provides an API that allows you to define "function" classes that can
compute a score for each document using whatever means you want.  The
overall FunctionQuery can then be wrapped in a BooleanQuery along with
whateer other search critera you have.

5 basic functions have been included that can be composed in all sorts of
interesting ways to compute scores based on document values for a
particular field (or the relative ordinal positions in the FieldCache for
that field)


: Date: Tue, 24 Jan 2006 17:42:06 -0000
: From: Nick Vincent <ni...@neoworks.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Sorting by calculated custom score at search time
:
: Hi,
:
: I am trying to find a way to create scores with a custom formula based
: on the initial score from Lucene and field values from each document,
: e.g. for each document:
:
:  finalScore = searchScore * (popularity) * (userRating)
:
: The customer requires this functionality as I have to replace an
: existing system that works like this.  User rating and popularity are
: already available and will be stored in Lucene.  I've looked through LIA
: and the approaches there don't seem to fit the requirement:
:
: 5.1.6 Sorting by multiple fields: only sorts by one field, then the
: next, I need to combine the scores
: 6.1 Using a custom sort method: does not take into account the
: document's original score
:
: >From an earlier thread discussing a calculated score based on the hit
: score and the age of document I gather that TSS regenerate their indexes
: to alter the document boost based on date.  I need to be able to sort by
: either relevance or "popularity rated relevance" depending on user
: input, so I don't think adding a precalculated document boost at index
: time is an option.
:
: In the worst case scenario I'll need to iterate through the hits and
: then sort them in memory myself, but I'm looking to be indexing around
: 500,000 documents, and in this particular application there are a lot of
: common keywords, so a large number of hits for a basic query is common.
: I'm trying to avoid this as it's an untidy solution which is likely to
: be (relatively) slow.
:
: I notice Erik has commented that "I've not come across a really clean
: way to do this sort of age-based
: boosting other than how TSS does it".  I was wondering if anyone has any
: experience with dirtier approaches they could share with me?
:
: Any help is really appreciated,
:
: Thanks,
:
: Nick
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting by calculated custom score at search time

Posted by gekkokid <me...@gekkokid.org.uk>.
how does TSS boost by date? give a small boost increase like 0.1 or 0.2 x 
(ArticlePublishDate - IndexCreationDate)?


----- Original Message ----- 
From: "Nick Vincent" <ni...@neoworks.com>
To: <ja...@lucene.apache.org>
Sent: Tuesday, January 24, 2006 5:42 PM
Subject: Sorting by calculated custom score at search time

I notice Erik has commented that "I've not come across a really clean
way to do this sort of age-based
boosting other than how TSS does it".  I was wondering if anyone has any
experience with dirtier approaches they could share with me?

Any help is really appreciated,

Thanks,

Nick


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org