You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom <so...@zvents.com> on 2006/12/27 23:35:18 UTC

boosts?

Hi -

I'm having a problem getting boosts to work the way I think they are 
supposed to.

What I want is for documents to be returned in doc boost order, when 
all the queries are constant scoring range queries. (e.g. date:[2006 TO 2007])

I believe (but am not certain) that this is supposed to be what 
happens. If that's not the case, you can probably skip the rest :-)

As an example, I grabbed solr-1.1, and ran it (java -jar start.jar).

Then I modified the hd.xml example doc, to add a boost on the first 
document (SP2514N)

<doc boost="100.0">

Then I loaded monitor.xml, and hd.xml

./post.sh monitor.xml
./post.sh hd.xml

I then went to the solr admin interface and queried on

id:[* TO *]

Which I believe gets mapped to a ConstantScoreRangeQuery.

So, given

http://fred:8983/solr/select/?q=id%3A%5B*+TO+*%5D&version=2.2&start=0&rows=10&indent=on&debugQuery=1

I get the result below. Note that all the results list "boost=1.0"

I would expect to see a boost of 100 on the SP2514N, in the 
explanation. Should I get that? I would also expect it to be at the 
head of the list, but I think I'm seeing the docs in insertion order. 
(if I insert xd.xml before monitor.xml, I get them in insertion order 
in that case as well.)

Please let me know if my assumptions or my methods aren't correct.

Thanks,

Tom


<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">4</int>
  <lst name="params">
   <str name="rows">10</str>
   <str name="start">0</str>


   <str name="indent">on</str>
   <str name="q">id:[* TO *]</str>
   <str name="debugQuery">1</str>
   <str name="version">2.2</str>
  </lst>
</lst>
<result name="response" numFound="3" start="0">
  <doc>


   <arr name="cat"><str>electronics</str><str>monitor</str></arr>
   <arr name="features"><str>30" TFT active matrix LCD, 2560 x 1600, 
.25mm dot pitch, 700:1 contrast</str></arr>
   <str name="id">3007WFP</str>
   <bool name="inStock">true</bool>
   <str name="includes">USB cable</str>
   <str name="manu">Dell, Inc.</str>


   <str name="name">Dell Widescreen UltraSharp 3007WFP</str>
   <int name="popularity">6</int>
   <float name="price">2199.0</float>
   <str name="sku">3007WFP</str>
   <float name="weight">401.6</float>
  </doc>


  <doc>
   <arr name="cat"><str>electronics</str><str>hard drive</str></arr>
   <arr name="features"><str>7200RPM, 8MB cache, IDE Ultra 
ATA-133</str><str>NoiseGuard, SilentSeek technology, Fluid Dynamic 
Bearing (FDB) motor</str></arr>
   <str name="id">SP2514N</str>
   <bool name="inStock">true</bool>
   <str name="manu">Samsung Electronics Co. Ltd.</str>


   <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250 
GB - ATA-133</str>
   <int name="popularity">6</int>
   <float name="price">92.0</float>
   <str name="sku">SP2514N</str>
  </doc>
  <doc>
   <arr name="cat"><str>electronics</str><str>hard drive</str></arr>


   <arr name="features"><str>SATA 3.0Gb/s, NCQ</str><str>8.5ms 
seek</str><str>16MB cache</str></arr>
   <str name="id">6H500F0</str>
   <bool name="inStock">true</bool>
   <str name="manu">Maxtor Corp.</str>
   <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300</str>


   <int name="popularity">6</int>
   <float name="price">350.0</float>
   <str name="sku">6H500F0</str>
  </doc>
</result>
<lst name="debug">
  <str name="rawquerystring">id:[* TO *]</str>
  <str name="querystring">id:[* TO *]</str>


  <str name="parsedquery">id:[* TO *]</str>
  <str name="parsedquery_toString">id:[* TO *]</str>
  <lst name="explain">
   <str name="id=3007WFP,internal_docid=8">
1.0 = (MATCH) ConstantScoreQuery(id:[-}), product of:
   1.0 = boost
   1.0 = queryNorm
</str>
   <str name="id=SP2514N,internal_docid=9">
1.0 = (MATCH) ConstantScoreQuery(id:[-}), product of:
   1.0 = boost
   1.0 = queryNorm
</str>
   <str name="id=6H500F0,internal_docid=10">


1.0 = (MATCH) ConstantScoreQuery(id:[-}), product of:
   1.0 = boost
   1.0 = queryNorm
</str>
  </lst>
</lst>
</response>




Re: boosts?

Posted by Tom <so...@zvents.com>.
At 06:03 PM 12/28/2006, you wrote:
>maybe i'm missing something, but it sounds like what you want is a simple
>sort on a numeric field -- whatever value you are tyring to use as the
>index time boost, you can just set as a field value instead and then sort
>on it right?

Yes.

I had been just been thinking about it in terms of how to use the 
info I already had in the index. But making another field works, too, 
and is probably simpler.


>: I was looking at how I would write a modified version of
>: MatchAllDocsQuery that would simply return the documents boost as the
>: score. But I haven't really figured out Lucene scoring.
>
>document boosts aren't maintained in the index ... they are multiplied by
>the various field boosts and lengthNorms and stored on a per field basis.

Thanks! I had seen comments that the doc boost wasn't stored, but 
didn't know how it worked.

Tom


Re: boosts?

Posted by Chris Hostetter <ho...@fucit.org>.
: >If not, you can add a field that is present in all documents, and add
: >this as part of the query.  Then you can fiddle with the index-time
: >field boost to alter the results (without skewing queries that have a
: >meaningful relevancy score as using document boosts would do).
:
: That seems to work. Thanks!

maybe i'm missing something, but it sounds like what you want is a simple
sort on a numeric field -- whatever value you are tyring to use as the
index time boost, you can just set as a field value instead and then sort
on it right?

: I was looking at how I would write a modified version of
: MatchAllDocsQuery that would simply return the documents boost as the
: score. But I haven't really figured out Lucene scoring.

document boosts aren't maintained in the index ... they are multiplied by
the various field boosts and lengthNorms and stored on a per field basis.




-Hoss


Re: boosts?

Posted by Tom <so...@zvents.com>.
At 12:03 PM 12/28/2006, you wrote:
>On 12/28/06, Tom <so...@zvents.com> wrote:
>Could you index  your documents in the desired order?  This is the
>default sort order.

I don't think I can control document order, as documents may get 
edited after creation.

>If not, you can add a field that is present in all documents, and add
>this as part of the query.  Then you can fiddle with the index-time
>field boost to alter the results (without skewing queries that have a
>meaningful relevancy score as using document boosts would do).

That seems to work. Thanks!

I'll probably do it that way, but... :-)

I was looking at how I would write a modified version of 
MatchAllDocsQuery that would simply return the documents boost as the 
score. But I haven't really figured out Lucene scoring.

Could someone explain how one would do something like this? I'm just 
trying to understand how one might do custom scoring in Lucene, so 
I'm more looking for concepts than code.

Thanks!

Tom


Re: boosts?

Posted by Mike Klaas <mi...@gmail.com>.
On 12/28/06, Tom <so...@zvents.com> wrote:

> >I'd recommend only using index-time boosting when you can't get the
> >relevance you want with query boosting and scoring.
>
> I'm not sure how I'd do it that way.
>
> What I want (what I _think_ I want :-) is a way to specify a default
> order for results, for the cases where the user has only provided
> exclusion information. In this case, I'm doing a match all docs, with
> filter queries.

Could you index  your documents in the desired order?  This is the
default sort order.

If not, you can add a field that is present in all documents, and add
this as part of the query.  Then you can fiddle with the index-time
field boost to alter the results (without skewing queries that have a
meaningful relevancy score as using document boosts would do).

-Mike

Re: boosts?

Posted by Tom <so...@zvents.com>.
Hi Yonik,

Thanks for the quick response.

At 07:45 AM 12/28/2006, you wrote:
>On 12/27/06, Tom <so...@zvents.com> wrote:
>>I'm having a problem getting boosts to work the way I think they are
>>supposed to.
>
>Do you have a specific relevance problem you are trying to solve, or
>just testing things out?

Specific problem.

Frequently our users will start by specifying a facet, such a date 
range, geo location, etc. At this point I don't have any positive 
query terms, just constant score range queries that are used to 
eliminate things the user is not interested in.  So at this point, 
there's nothing to be relevant to, so I need to pick some ordering. 
Since I have information about which results tend to be more 
interesting in the general case, I've set boosts on the documents. 
I'd like to order by that, until the user gives me more information.

For an example, think of amazon ordering by "best selling", when the 
user asks for books published since Dec. 1st. You don't yet know what 
is relevant to this user's query, since all you have is "since Dec 
1st", but you want to give an order more reasonable than "doc 
number", or "date published".

>>What I want is for documents to be returned in doc boost order, when
>>all the queries are constant scoring range queries. (e.g. 
>>date:[2006 TO 2007])
>
>They are *constant scoring* range queries :-)  Index-time boosts
>currently don't factor in.

Gotcha. I think I misinterpreted an earlier post (which did say 
"query boost"). I was thinking it would include index time boost, too.


>I'd recommend only using index-time boosting when you can't get the
>relevance you want with query boosting and scoring.

I'm not sure how I'd do it that way.

What I want (what I _think_ I want :-) is a way to specify a default 
order for results, for the cases where the user has only provided 
exclusion information. In this case, I'm doing a match all docs, with 
filter queries.

Tom


Re: boosts?

Posted by Yonik Seeley <yo...@apache.org>.
On 12/27/06, Tom <so...@zvents.com> wrote:
> I'm having a problem getting boosts to work the way I think they are
> supposed to.

Do you have a specific relevance problem you are trying to solve, or
just testing things out?

> What I want is for documents to be returned in doc boost order, when
> all the queries are constant scoring range queries. (e.g. date:[2006 TO 2007])

They are *constant scoring* range queries :-)  Index-time boosts
currently don't factor in.

[...]
> I would expect to see a boost of 100 on the SP2514N, in the
> explanation. Should I get that?

Even if index-time boosts were factored in, you wouldn't see an
explicit 'boost'.
The index-time boost is multiplied by the length normalization factor
for the field and the product is the "norm".

> I would also expect it to be at the
> head of the list, but I think I'm seeing the docs in insertion order.
> (if I insert xd.xml before monitor.xml, I get them in insertion order
> in that case as well.)

I'd recommend only using index-time boosting when you can't get the
relevance you want with query boosting and scoring.

-Yonik