You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Head <tr...@mac.com> on 2008/02/27 01:23:15 UTC

boost ignored with wildcard queries

Using the StandardRequestHandler, it appears that the index boost values are
ignored when the query has a wildcard in it.   For example, if I have 2
<doc>'s and one has a boost of 1.0 and another has a boost of 10.0, then I
do a search for "bob*", both records will be returned with the same score of
1.0.   If I just do a normal search then the <doc> that has the higher boost
has the higher score as expected.

Is this a bug?

~Tom

p.s. Here's what my debug looks like:

<str name="id=24279,internal_docid=2180">
1.0 = (MATCH)
ConstantScoreQuery(org.apache.solr.search.PrefixFilter@5c7da7), product of:
  1.0 = boost
  1.0 = queryNorm
</str>
<str name="id=24317,internal_docid=2217">
1.0 = (MATCH)
ConstantScoreQuery(org.apache.solr.search.PrefixFilter@3d68df), product of:
  1.0 = boost
  1.0 = queryNorm
</str>
-- 
View this message in context: http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p15703334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boost ignored with wildcard queries

Posted by Yonik Seeley <yo...@apache.org>.
On Fri, Jun 6, 2008 at 5:16 PM, David Smiley @MITRE.org
<DS...@mitre.org> wrote:
> Curious... Why is ConstantScoreQuery only applied to prefix queries?  Your
> rationale suggests that it is also applicable wildcard query and fuzzy query
> too (basically any place an analyzer isn't used).

I think fuzzy queries may have been fixed in lucene to not exceed the
boolean query clause limit.
WildCard queries: no good reason... didn't really need it, so I never
got around to it :-)

-Yonik

> ~ David Smiley
>
>
> Yonik Seeley wrote:
>>
>> On Tue, Feb 26, 2008 at 7:23 PM, Head <tr...@mac.com> wrote:
>>>
>>>  Using the StandardRequestHandler, it appears that the index boost values
>>> are
>>>  ignored when the query has a wildcard in it.   For example, if I have 2
>>>  <doc>'s and one has a boost of 1.0 and another has a boost of 10.0, then
>>> I
>>>  do a search for "bob*", both records will be returned with the same
>>> score of
>>>  1.0.   If I just do a normal search then the <doc> that has the higher
>>> boost
>>>  has the higher score as expected.
>>>
>>>  Is this a bug?
>>
>> A feature :-)
>> Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to
>> avoid getting exceptions from too many terms.
>>
>> -Yonik
>>
>>
>>>  ~Tom
>>>
>>>  p.s. Here's what my debug looks like:
>>>
>>>  <str name="id=24279,internal_docid=2180">
>>>  1.0 = (MATCH)
>>>  ConstantScoreQuery(org.apache.solr.search.PrefixFilter@5c7da7), product
>>> of:
>>>   1.0 = boost
>>>   1.0 = queryNorm
>>>  </str>
>>>  <str name="id=24317,internal_docid=2217">
>>>  1.0 = (MATCH)
>>>  ConstantScoreQuery(org.apache.solr.search.PrefixFilter@3d68df), product
>>> of:
>>>   1.0 = boost
>>>   1.0 = queryNorm
>>>  </str>
>>>  --
>>>  View this message in context:
>>> http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p15703334.html
>>>  Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p17701306.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: boost ignored with wildcard queries

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.
Curious... Why is ConstantScoreQuery only applied to prefix queries?  Your
rationale suggests that it is also applicable wildcard query and fuzzy query
too (basically any place an analyzer isn't used).

~ David Smiley


Yonik Seeley wrote:
> 
> On Tue, Feb 26, 2008 at 7:23 PM, Head <tr...@mac.com> wrote:
>>
>>  Using the StandardRequestHandler, it appears that the index boost values
>> are
>>  ignored when the query has a wildcard in it.   For example, if I have 2
>>  <doc>'s and one has a boost of 1.0 and another has a boost of 10.0, then
>> I
>>  do a search for "bob*", both records will be returned with the same
>> score of
>>  1.0.   If I just do a normal search then the <doc> that has the higher
>> boost
>>  has the higher score as expected.
>>
>>  Is this a bug?
> 
> A feature :-)
> Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to
> avoid getting exceptions from too many terms.
> 
> -Yonik
> 
> 
>>  ~Tom
>>
>>  p.s. Here's what my debug looks like:
>>
>>  <str name="id=24279,internal_docid=2180">
>>  1.0 = (MATCH)
>>  ConstantScoreQuery(org.apache.solr.search.PrefixFilter@5c7da7), product
>> of:
>>   1.0 = boost
>>   1.0 = queryNorm
>>  </str>
>>  <str name="id=24317,internal_docid=2217">
>>  1.0 = (MATCH)
>>  ConstantScoreQuery(org.apache.solr.search.PrefixFilter@3d68df), product
>> of:
>>   1.0 = boost
>>   1.0 = queryNorm
>>  </str>
>>  --
>>  View this message in context:
>> http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p15703334.html
>>  Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p17701306.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boost ignored with wildcard queries

Posted by Yonik Seeley <yo...@apache.org>.
On Wed, Mar 5, 2008 at 4:27 PM, Chris Hostetter
<ho...@fucit.org> wrote:
>
>  : Doug & I talked about this a while ago. At a minimum, it would require
>
> : byte[maxDoc()] to store scores in a compressed 8 bit format.  It would
>  : certainly impact performance too.
>
>  Why would you have to store the scores?  why not just add an optional
>  byte[]norms param to ConstantScoreQuery

Ah, I see what you mean.  Good idea.  It doesn't handle score
accumulation when multiple terms hit the same doc (prefix query), and
doesn't balance lengthNorm with tf,  but it's a lot better than
nothing and still serves to pop index-boosted docs to the top.

-Yonik

>, if it's null, things work as they
>  currently do, if it's non null ConstantWeight.scorer returns a new
>  subclass of ConstantScorer where score is implemented as...
>
>     public float score() throws IOException {
>       return theScore * normDecoder[norms[doc] & 0xFF];
>     }
>
>  (where normDecoder is just like in TermScorer)

Re: boost ignored with wildcard queries

Posted by Chris Hostetter <ho...@fucit.org>.
: Doug & I talked about this a while ago. At a minimum, it would require
: byte[maxDoc()] to store scores in a compressed 8 bit format.  It would
: certainly impact performance too.

Why would you have to store the scores?  why not just add an optional 
byte[]norms param to ConstantScoreQuery, if it's null, things work as they 
currently do, if it's non null ConstantWeight.scorer returns a new 
subclass of ConstantScorer where score is implemented as...

    public float score() throws IOException {
      return theScore * normDecoder[norms[doc] & 0xFF];
    }

(where normDecoder is just like in TermScorer)






-Hoss


Re: boost ignored with wildcard queries

Posted by Yonik Seeley <yo...@apache.org>.
On Wed, Mar 5, 2008 at 4:07 PM, Chris Hostetter
<ho...@fucit.org> wrote:
>  : >  Using the StandardRequestHandler, it appears that the index boost values are
>
> : >  ignored when the query has a wildcard in it.   For example, if I have 2
>  : >  <doc>'s and one has a boost of 1.0 and another has a boost of 10.0, then I
>  : >  do a search for "bob*", both records will be returned with the same score of
>  : >  1.0.   If I just do a normal search then the <doc> that has the higher boost
>  : >  has the higher score as expected.
>
>  : A feature :-)
>
> : Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to
>  : avoid getting exceptions from too many terms.
>
>  Hmmm... except for the fact that the name would be even more missleading,
>  there's really no performance related reason why ConstantScoreRangeQuery
>  and ConstantScorePrefixQuery couldn't use the fieldNorms (if they exist)
>  when computing the score.  the "constant score" part of their names
>  refered to not doing term expansion to find tf/idf factors ... but the
>  doc/field/length info encoded into the norms could still be factored into
>  the score fairly efficiently.

Doug & I talked about this a while ago. At a minimum, it would require
byte[maxDoc()] to store scores in a compressed 8 bit format.  It would
certainly impact performance too.

-Yonik

Re: boost ignored with wildcard queries

Posted by Chris Hostetter <ho...@fucit.org>.
: >  Using the StandardRequestHandler, it appears that the index boost values are
: >  ignored when the query has a wildcard in it.   For example, if I have 2
: >  <doc>'s and one has a boost of 1.0 and another has a boost of 10.0, then I
: >  do a search for "bob*", both records will be returned with the same score of
: >  1.0.   If I just do a normal search then the <doc> that has the higher boost
: >  has the higher score as expected.

: A feature :-)
: Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to
: avoid getting exceptions from too many terms.

Hmmm... except for the fact that the name would be even more missleading, 
there's really no performance related reason why ConstantScoreRangeQuery 
and ConstantScorePrefixQuery couldn't use the fieldNorms (if they exist) 
when computing the score.  the "constant score" part of their names 
refered to not doing term expansion to find tf/idf factors ... but the 
doc/field/length info encoded into the norms could still be factored into 
the score fairly efficiently.

this would be something to submit as apatch to Lucene-Java if anyone is 
interested.



-Hoss


Re: boost ignored with wildcard queries

Posted by Yonik Seeley <yo...@apache.org>.
On Tue, Feb 26, 2008 at 7:23 PM, Head <tr...@mac.com> wrote:
>
>  Using the StandardRequestHandler, it appears that the index boost values are
>  ignored when the query has a wildcard in it.   For example, if I have 2
>  <doc>'s and one has a boost of 1.0 and another has a boost of 10.0, then I
>  do a search for "bob*", both records will be returned with the same score of
>  1.0.   If I just do a normal search then the <doc> that has the higher boost
>  has the higher score as expected.
>
>  Is this a bug?

A feature :-)
Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to
avoid getting exceptions from too many terms.

-Yonik


>  ~Tom
>
>  p.s. Here's what my debug looks like:
>
>  <str name="id=24279,internal_docid=2180">
>  1.0 = (MATCH)
>  ConstantScoreQuery(org.apache.solr.search.PrefixFilter@5c7da7), product of:
>   1.0 = boost
>   1.0 = queryNorm
>  </str>
>  <str name="id=24317,internal_docid=2217">
>  1.0 = (MATCH)
>  ConstantScoreQuery(org.apache.solr.search.PrefixFilter@3d68df), product of:
>   1.0 = boost
>   1.0 = queryNorm
>  </str>
>  --
>  View this message in context: http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p15703334.html
>  Sent from the Solr - User mailing list archive at Nabble.com.
>
>