You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Anil Cherian <ch...@gmail.com> on 2009/11/20 17:34:10 UTC

comparing index-time boost and sort in the case of a date field

Hi,

I have a requirement to get results in the order of latest date of a field
called approval_dt. ie results having the latest approval date should appear
first in the SOLR results xml. A sorting "desc" on approval_dt gave me this.

Can index-time boost be of use here to improve performance. Could you please
help me with an answer.

Thank You.
Anil.

Re: comparing index-time boost and sort in the case of a date field

Posted by Chris Hostetter <ho...@fucit.org>.
: 
: I have a requirement where I need to display records with more recent values
: for approval_dt to come first when a query is made. I thought of approaching
: this in 2 different ways:-

	...

: 2. INDEX-TIME boosting.
: I sorted the query from databse itself in asc order of approval_dt while
: creating my input xml and while creating each *<doc>* gave it a boost
: increment by 0.1 starting from 1.01. Those records which don't have a value

index time boosts are folded info the fieldNorm which is float indexed 
using a compressed "buyte" incoding, so many initial values all collapse 
down to the same final value -- which means you aren't going to get the 
granulatiry you want from index time booksts like 1.01 and 1.02 even if 
you do every thing else perfectly.

if you have the luxary of sorting your docs before indexing them, then you 
should sort them by approval_dt *descending* and then iterate over them 
and add them to the index.  then you can use the new "_docid_ asc" sort 
syntax added in Solr 1.4

Ascending sord by internal docid (ie: the order that documents are 
indexed) is essentially free in Lucene/Solr -- so you should find that 
much faster then sorting by an explicit field (or even sorting by score)



-Hoss


Re: comparing index-time boost and sort in the case of a date field

Posted by Anil Cherian <ch...@gmail.com>.
Hi Eric, David,

Thank you for the mail.

I have a requirement where I need to display records with more recent values
for approval_dt to come first when a query is made. I thought of approaching
this in 2 different ways:-
1. SORTING
/select/?q=water%0D%0A&version=2.2&start=0&rows=10&indent=on&fl=approval_dt,score&sort=approval_dt%20desc
This worked fine and i got the records with recent approval_dt to appear
first among the results.

2. INDEX-TIME boosting.
I sorted the query from databse itself in asc order of approval_dt while
creating my input xml and while creating each *<doc>* gave it a boost
increment by 0.1 starting from 1.01. Those records which don't have a value
for approval_dt is just assigned the boost value of 1.0. So i believe
document with recent values of approval_dt should have received a higher
boost. No field is particularly boosted only the document as a whole.

Could some body pls help frame a query which would give me the result in
such a way that the docs with recent approva_dt comes first. I dont want to
use SORT in my second approach. How do i exploit the higher boost values I
gave for the docs with recent approval_dt in my query. I myself tried a
query which is :-
/select?qt=dismaxboost&q=water&bq=water&version=2.2&start=0&rows=10&indent=on&fl=approval_dt,score
...but though the number of results for this matches that in 1 (sort query),
I am not getting results with recent approval_dt first. Just random
distribution of results :(

Cld someone pls help.

I am just doing a hands on of these approaches to compare their performance
also.

Thanks and Rgds,
Anil.


Could some one pls help me frame the corresponding "INDEX-TIME" related
query for the following query:-



On Sat, Nov 21, 2009 at 3:16 PM, Erick Erickson <er...@gmail.com>wrote:

> First, could you state the reason you aren't satisfied? You imply that your
> speed isn't what you want, so some details would help.
>
> How big is your index? How many documents? What query is slow? Is your
> first query slow or all queries where you sort on date? This later is, as
> David says,
> may be curable by a warmup query or two. How are you storing your dates, in
> particular, what is their resolution?
>
> The more details, the better answer people can give <G>..
>
> Best
> Erick
>
>
>
> On Fri, Nov 20, 2009 at 5:31 PM, Smiley, David W. <ds...@mitre.org>
> wrote:
>
>  > Using index time boosting isn't really a substitute for sorting.  It
> will
> > be faster (I'm pretty sure) but isn't the same thing.  The index time
> boost
> > is going to influence the score but not totally become the score... which
> > means that in all likelihood there will be documents in search results
> that
> > are out of order with respect to the approval_dt.  You might use high
> boost
> > values as a compromise (ex: 100,200,300,...) but that wouldn't feel right
> to
> > me in any case.
> >
> > If your sorting result performance isn't fast enough then I'd discuss it
> > here with everyone.  You'll want to put fields you sort on (like
> > approval_dt) in a warming query so that when the search needs to sort on
> > this field, the sort information is already cached.  This cache is
> > invalidated when you modify the index, by the way.
> >
> > ~ David Smiley
> > Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
> >
> > On Nov 20, 2009, at 11:34 AM, Anil Cherian wrote:
> >
> > > Hi,
> > >
> > > I have a requirement to get results in the order of latest date of a
> > field
> > > called approval_dt. ie results having the latest approval date should
> > appear
> > > first in the SOLR results xml. A sorting "desc" on approval_dt gave me
> > this.
> > >
> > > Can index-time boost be of use here to improve performance. Could you
> > please
> > > help me with an answer.
> > >
> > > Thank You.
> > > Anil.
> >
> >
>

Re: comparing index-time boost and sort in the case of a date field

Posted by Erick Erickson <er...@gmail.com>.
First, could you state the reason you aren't satisfied? You imply that your
speed isn't what you want, so some details would help.

How big is your index? How many documents? What query is slow? Is your
first query slow or all queries where you sort on date? This later is, as
David says,
may be curable by a warmup query or two. How are you storing your dates, in
particular, what is their resolution?

The more details, the better answer people can give <G>..

Best
Erick



On Fri, Nov 20, 2009 at 5:31 PM, Smiley, David W. <ds...@mitre.org> wrote:

> Using index time boosting isn't really a substitute for sorting.  It will
> be faster (I'm pretty sure) but isn't the same thing.  The index time boost
> is going to influence the score but not totally become the score... which
> means that in all likelihood there will be documents in search results that
> are out of order with respect to the approval_dt.  You might use high boost
> values as a compromise (ex: 100,200,300,...) but that wouldn't feel right to
> me in any case.
>
> If your sorting result performance isn't fast enough then I'd discuss it
> here with everyone.  You'll want to put fields you sort on (like
> approval_dt) in a warming query so that when the search needs to sort on
> this field, the sort information is already cached.  This cache is
> invalidated when you modify the index, by the way.
>
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
> On Nov 20, 2009, at 11:34 AM, Anil Cherian wrote:
>
> > Hi,
> >
> > I have a requirement to get results in the order of latest date of a
> field
> > called approval_dt. ie results having the latest approval date should
> appear
> > first in the SOLR results xml. A sorting "desc" on approval_dt gave me
> this.
> >
> > Can index-time boost be of use here to improve performance. Could you
> please
> > help me with an answer.
> >
> > Thank You.
> > Anil.
>
>

Re: comparing index-time boost and sort in the case of a date field

Posted by "Smiley, David W." <ds...@mitre.org>.
Using index time boosting isn't really a substitute for sorting.  It will be faster (I'm pretty sure) but isn't the same thing.  The index time boost is going to influence the score but not totally become the score... which means that in all likelihood there will be documents in search results that are out of order with respect to the approval_dt.  You might use high boost values as a compromise (ex: 100,200,300,...) but that wouldn't feel right to me in any case.

If your sorting result performance isn't fast enough then I'd discuss it here with everyone.  You'll want to put fields you sort on (like approval_dt) in a warming query so that when the search needs to sort on this field, the sort information is already cached.  This cache is invalidated when you modify the index, by the way.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Nov 20, 2009, at 11:34 AM, Anil Cherian wrote:

> Hi,
> 
> I have a requirement to get results in the order of latest date of a field
> called approval_dt. ie results having the latest approval date should appear
> first in the SOLR results xml. A sorting "desc" on approval_dt gave me this.
> 
> Can index-time boost be of use here to improve performance. Could you please
> help me with an answer.
> 
> Thank You.
> Anil.