You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Robert Pohl <ro...@gmail.com> on 2009/07/21 16:50:35 UTC

Boosting dates

Hi,

I have a lot of articles indexed with title, body and date.
How can I boost the dates so that the most recent articles have higher 
score than the older ones?

Thanks,
Rob


RE: Boosting dates

Posted by Moray McConnachie <mm...@oxford-analytica.com>.
The normal way to do this is to use a RangeQuery to identify a range of
dates that are important and apply a boost to that query term. 

In query parser syntax it might look like

+"your query here" date:[20090101 TO 20090721]^3 

You could even apply several such boosts

+"your query here" date:[20090101 TO 20090721]^5  date:[20080101 TO
20081231]^2

You will have to experiment with the relative boost levels.

Hope this is helpful.

Yours,
Moray
--------------------------------------
Moray McConnachie
Director of IT
Oxford Analytica

+44 1865 261 600 http://www.oxan.com  

> -----Original Message-----
> From: Robert Pohl [mailto:robban.p@gmail.com] 
> Sent: 21 July 2009 08:51
> To: lucene-net-user@incubator.apache.org
> Subject: Boosting dates
> 
> Hi,
> 
> I have a lot of articles indexed with title, body and date.
> How can I boost the dates so that the most recent articles 
> have higher score than the older ones?
> 
> Thanks,
> Rob
> 
> 
> 


Re: Boosting dates

Posted by Sean Carpenter <st...@gmail.com>.
Moray,
We already reindex regularly as it simplifies other index management
tasks for us (document update, removal, etc.).  Our documents are
broken down into a number of smaller indexes for use on our websites,
so reindexing is not a huge overhead for us.  I can see how it would
be a problem with much larger indexes though.

Sean

On Tue, Jul 21, 2009 at 11:37 AM, Moray
McConnachie<mm...@oxford-analytica.com> wrote:
>>When adding the document to the index, you can call Document.SetBoost
> to set the overall document's boost
>
> Sean, that's interesting, handling boost at the indexing end rather than
> the querying end.
>
> Simplifies querying, but how do you handle the aging of documents? Do
> you need to reindex completely in order to reset the document boosts as
> they age?
>
> Yours,
> Moray
>
> --------------------------------------
> Moray McConnachie
> Director of IT
> Oxford Analytica
>
> +44 1865 261 600 http://www.oxan.com
>
>> -----Original Message-----
>> From: Sean Carpenter [mailto:stcarpenter@gmail.com]
>> Sent: 21 July 2009 09:24
>> To: lucene-net-user@incubator.apache.org
>> Subject: Re: Boosting dates
>>
>> Rob,
>> We use the opposite approach and use a lower boost value
>> during indexing for older documents (which makes newer ones
>> score higher).
>>
>> When adding the document to the index, you can call Document.SetBoost
>> (http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucen
> e/document/Document.html#setBoost(float))
>> to set the overall document's boost factor.  We use a
>> pre-defined scale based on the age of the document something
>> like: less than 3 months = boost 1, 3 - 6 months = boost 0.8,
>> 6 - 12 months = boost 0.4, etc.
>>
>> Sean
>>
>> On Tue, Jul 21, 2009 at 10:50 AM, Robert
>> Pohl<ro...@gmail.com> wrote:
>> > Hi,
>> >
>> > I have a lot of articles indexed with title, body and date.
>> > How can I boost the dates so that the most recent articles
>> have higher
>> > score than the older ones?
>> >
>> > Thanks,
>> > Rob
>> >
>> >
>>
>>
>
>

RE: Boosting dates

Posted by Moray McConnachie <mm...@oxford-analytica.com>.
>When adding the document to the index, you can call Document.SetBoost
to set the overall document's boost

Sean, that's interesting, handling boost at the indexing end rather than
the querying end. 

Simplifies querying, but how do you handle the aging of documents? Do
you need to reindex completely in order to reset the document boosts as
they age?

Yours,
Moray

--------------------------------------
Moray McConnachie
Director of IT
Oxford Analytica

+44 1865 261 600 http://www.oxan.com  

> -----Original Message-----
> From: Sean Carpenter [mailto:stcarpenter@gmail.com] 
> Sent: 21 July 2009 09:24
> To: lucene-net-user@incubator.apache.org
> Subject: Re: Boosting dates
> 
> Rob,
> We use the opposite approach and use a lower boost value 
> during indexing for older documents (which makes newer ones 
> score higher).
> 
> When adding the document to the index, you can call Document.SetBoost
> (http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucen
e/document/Document.html#setBoost(float))
> to set the overall document's boost factor.  We use a 
> pre-defined scale based on the age of the document something 
> like: less than 3 months = boost 1, 3 - 6 months = boost 0.8, 
> 6 - 12 months = boost 0.4, etc.
> 
> Sean
> 
> On Tue, Jul 21, 2009 at 10:50 AM, Robert 
> Pohl<ro...@gmail.com> wrote:
> > Hi,
> >
> > I have a lot of articles indexed with title, body and date.
> > How can I boost the dates so that the most recent articles 
> have higher 
> > score than the older ones?
> >
> > Thanks,
> > Rob
> >
> >
> 
> 


Re: Boosting dates

Posted by Moray McConnachie <mm...@oxford-analytica.com>.
Robert Pohl wrote:
> Sean, I was thinking about something like what you describe:
>
> I index the articles in realtime basically, so the dates are pretty 
> much "right now". But can I compare the dates to a start date such as 
> 2000-01-01 and set the boost to the diffenrence between the dates? 
> This will make the boost number higher as time goes but will that be a 
> problem?
A cunning idea. I don't see why it should be a problem except that if 
you are applying other boosts at index it would change the relative 
importance of the date component. I can't figure out if this would also 
apply if you were applying other boosts at query time. I would certainly 
set the  per day boost value to  a fairly small fraction. You would need 
to experiment to see how much difference you need between older and 
newer docs.
> Moray: Will the range queries affect the performance?
>
Short answer is yes, but how much depends on how many documents you are 
indexing and probably more importantly the size of the result sets your 
basic search terms are retrieving. So if a typical search returns only 
1000 documents out of 1 million then the range queries should make 
little or no difference. If a typical search returns 900 000 out of 1 
million then they will make a substantial difference.

In our case the Lucene return time is so relatively small compared to 
overall time to display results that an increase makes negligible 
difference to end users.

> Lets say that I want to boost all articles that are one week old with 
> 5 and all that are one month old with 2 and leave all the rest.
So this example would leave you with two range queries over a relatively 
small range - you should experiment, but I don't think it's a big problem.
> Thanks for your input!
>
You're welcome.
> //Rob
>
>
>
>
> Sean Carpenter wrote:
>> Rob,
>> We use the opposite approach and use a lower boost value during
>> indexing for older documents (which makes newer ones score higher).
>>
>> When adding the document to the index, you can call Document.SetBoost
>> (http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/document/Document.html#setBoost(float)) 
>>
>> to set the overall document's boost factor.  We use a pre-defined
>> scale based on the age of the document something like: less than 3
>> months = boost 1, 3 - 6 months = boost 0.8, 6 - 12 months = boost 0.4,
>> etc.
>>
>> Sean
>>
>> On Tue, Jul 21, 2009 at 10:50 AM, Robert Pohl<ro...@gmail.com> wrote:
>>  
>>> Hi,
>>>
>>> I have a lot of articles indexed with title, body and date.
>>> How can I boost the dates so that the most recent articles have 
>>> higher score
>>> than the older ones?
>>>
>>> Thanks,
>>> Rob
>>>
>>>
>>>     
>
>



Re: Boosting dates

Posted by Robert Pohl <ro...@gmail.com>.
Sean, I was thinking about something like what you describe:

I index the articles in realtime basically, so the dates are pretty much 
"right now". But can I compare the dates to a start date such as 
2000-01-01 and set the boost to the diffenrence between the dates? This 
will make the boost number higher as time goes but will that be a problem?

Another solution would be to re-index all the documents and calculate 
new date boost each time.. but i guess that will be a big thing.

Moray: Will the range queries affect the performance?

Lets say that I want to boost all articles that are one week old with 5 
and all that are one month old with 2 and leave all the rest.

Thanks for your input!

//Rob




Sean Carpenter wrote:
> Rob,
> We use the opposite approach and use a lower boost value during
> indexing for older documents (which makes newer ones score higher).
>
> When adding the document to the index, you can call Document.SetBoost
> (http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/document/Document.html#setBoost(float))
> to set the overall document's boost factor.  We use a pre-defined
> scale based on the age of the document something like: less than 3
> months = boost 1, 3 - 6 months = boost 0.8, 6 - 12 months = boost 0.4,
> etc.
>
> Sean
>
> On Tue, Jul 21, 2009 at 10:50 AM, Robert Pohl<ro...@gmail.com> wrote:
>   
>> Hi,
>>
>> I have a lot of articles indexed with title, body and date.
>> How can I boost the dates so that the most recent articles have higher score
>> than the older ones?
>>
>> Thanks,
>> Rob
>>
>>
>>     


Re: Boosting dates

Posted by Sean Carpenter <st...@gmail.com>.
Rob,
We use the opposite approach and use a lower boost value during
indexing for older documents (which makes newer ones score higher).

When adding the document to the index, you can call Document.SetBoost
(http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/document/Document.html#setBoost(float))
to set the overall document's boost factor.  We use a pre-defined
scale based on the age of the document something like: less than 3
months = boost 1, 3 - 6 months = boost 0.8, 6 - 12 months = boost 0.4,
etc.

Sean

On Tue, Jul 21, 2009 at 10:50 AM, Robert Pohl<ro...@gmail.com> wrote:
> Hi,
>
> I have a lot of articles indexed with title, body and date.
> How can I boost the dates so that the most recent articles have higher score
> than the older ones?
>
> Thanks,
> Rob
>
>