You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Larry He <sh...@gmail.com> on 2009/08/18 16:01:33 UTC

Is negative boost possible?

Hi all,

I am looking for a way to assign negative boost to a term in Solr query.
Our use scenario is that we want to boost matching documents that are
updated recently and penalize those that have not been updated for a long
time.  There are other terms in the query that would affect the scores as
well.  For example we construct a query similar to this:

*:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO *]^5
lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3

I notice it's not possible to simply use a negative boosting factor in the
query.  Is there any way to achieve such result?

Regards,
Shi Quan He

Re: Is negative boost possible?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Yonik Seeley wrote:
> On Mon, Oct 12, 2009 at 12:03 PM, Andrzej Bialecki <ab...@getopt.org> wrote:
>>> Solr never discarded non-positive hits, and now Lucene 2.9 no longer
>>> does either.
>> Hmm ... The code that I pasted in my previous email uses
>> Searcher.search(Query, int), which in turn uses search(Query, Filter, int),
>> and it doesn't return any results if only the first clause is present (the
>> one with negative boost) even though it's a matching clause.
>>
>> I think this is related to the fact that in TopScoreDocCollector:48 the
>> pqTop.score is initialized to 0, and then all results that have lower score
>> that this are discarded. Perhaps this should be initialized to
>> Float.MIN_VALUE?
> 
> Hmmm, You're actually seeing this with Lucene 2.9?
> The HitQueue (subclass of PriorityQueue) is pre-populated with
> sentinel objects with scores of -Inf, not zero.

Uhh, sorry, you are right - an early 2.9-dev version of the jar sneaked 
in on my classpath .. I verified now that 2.9.0 returns both positive 
and negative scores with the default TopScoreDocCollector.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Is negative boost possible?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Oct 12, 2009 at 12:03 PM, Andrzej Bialecki <ab...@getopt.org> wrote:
>> Solr never discarded non-positive hits, and now Lucene 2.9 no longer
>> does either.
>
> Hmm ... The code that I pasted in my previous email uses
> Searcher.search(Query, int), which in turn uses search(Query, Filter, int),
> and it doesn't return any results if only the first clause is present (the
> one with negative boost) even though it's a matching clause.
>
> I think this is related to the fact that in TopScoreDocCollector:48 the
> pqTop.score is initialized to 0, and then all results that have lower score
> that this are discarded. Perhaps this should be initialized to
> Float.MIN_VALUE?

Hmmm, You're actually seeing this with Lucene 2.9?
The HitQueue (subclass of PriorityQueue) is pre-populated with
sentinel objects with scores of -Inf, not zero.

-Yonik
http://www.lucidimagination.com

Re: Is negative boost possible?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Yonik Seeley wrote:
> On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>> BTW, standard Collectors collect only results
>> with positive scores, so if you want to collect results with negative scores
>> as well then you need to use a custom Collector.
> 
> Solr never discarded non-positive hits, and now Lucene 2.9 no longer
> does either.

Hmm ... The code that I pasted in my previous email uses 
Searcher.search(Query, int), which in turn uses search(Query, Filter, 
int), and it doesn't return any results if only the first clause is 
present (the one with negative boost) even though it's a matching clause.

I think this is related to the fact that in TopScoreDocCollector:48 the 
pqTop.score is initialized to 0, and then all results that have lower 
score that this are discarded. Perhaps this should be initialized to 
Float.MIN_VALUE?


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Is negative boost possible?

Posted by Yonik Seeley <ys...@gmail.com>.
On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
> BTW, standard Collectors collect only results
> with positive scores, so if you want to collect results with negative scores
> as well then you need to use a custom Collector.

Solr never discarded non-positive hits, and now Lucene 2.9 no longer
does either.

-Yonik

Re: Is negative boost possible?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Yonik Seeley wrote:
> On Sun, Oct 11, 2009 at 6:04 PM, Lance Norskog <go...@gmail.com> wrote:
>> And the other important
>> thing to know about boost values is that the dynamic range is about
>> 6-8 bits
> 
> That's an index-time boost - an 8 bit float with 5 bits of mantissa
> and 3 bits of exponent.
> Query time boosts are normal 32 bit floats.

To be more specific: index-time float encoding does not permit negative 
numbers (see SmallFloat), but query-time boosts can be negative, and 
they DO affect the score - see below. BTW, standard Collectors collect 
only results with positive scores, so if you want to collect results 
with negative scores as well then you need to use a custom Collector.

-----------------------------------------------
BeanShell 2.0b4 - by Pat Niemeyer (pat@pat.net)
bsh % import org.apache.lucene.search.*;
bsh % import org.apache.lucene.index.*;
bsh % import org.apache.lucene.store.*;
bsh % import org.apache.lucene.document.*;
bsh % import org.apache.lucene.analysis.*;
bsh % tq = new TermQuery(new Term("a", "b"));
bsh % print(tq);
a:b
bsh % tq.setBoost(-1);
bsh % print(tq);
a:b^-1.0
bsh % q = new BooleanQuery();
bsh % tq1 = new TermQuery(new Term("a", "c"));
bsh % tq1.setBoost(10);
bsh % q.add(tq1, BooleanClause.Occur.SHOULD);
bsh % q.add(tq, BooleanClause.Occur.SHOULD);
bsh % print(q);
a:c^10.0 a:b^-1.0
bsh % dir = new RAMDirectory();
bsh % w = new IndexWriter(dir, new WhitespaceAnalyzer());
bsh % doc = new Document();
bsh % doc.add(new Field("a", "b c d", Field.Store.YES, 
Field.Index.ANALYZED));
bsh % w.addDocument(doc);
bsh % w.close();
bsh % r = IndexReader.open(dir);
bsh % is = new IndexSearcher(r);
bsh % td = is.search(q, 10);
bsh % sd = td.scoreDocs;
bsh % print(sd.length);
1
bsh % print(is.explain(q, 0));
0.1373985 = (MATCH) sum of:
   0.15266499 = (MATCH) weight(a:c^10.0 in 0), product of:
     0.99503726 = queryWeight(a:c^10.0), product of:
       10.0 = boost
       0.30685282 = idf(docFreq=1, numDocs=1)
       0.32427183 = queryNorm
     0.15342641 = (MATCH) fieldWeight(a:c in 0), product of:
       1.0 = tf(termFreq(a:c)=1)
       0.30685282 = idf(docFreq=1, numDocs=1)
       0.5 = fieldNorm(field=a, doc=0)
   -0.0152664995 = (MATCH) weight(a:b^-1.0 in 0), product of:
     -0.099503726 = queryWeight(a:b^-1.0), product of:
       -1.0 = boost
       0.30685282 = idf(docFreq=1, numDocs=1)
       0.32427183 = queryNorm
     0.15342641 = (MATCH) fieldWeight(a:b in 0), product of:
       1.0 = tf(termFreq(a:b)=1)
       0.30685282 = idf(docFreq=1, numDocs=1)
       0.5 = fieldNorm(field=a, doc=0)

bsh %


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Is negative boost possible?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sun, Oct 11, 2009 at 6:04 PM, Lance Norskog <go...@gmail.com> wrote:
> And the other important
> thing to know about boost values is that the dynamic range is about
> 6-8 bits

That's an index-time boost - an 8 bit float with 5 bits of mantissa
and 3 bits of exponent.
Query time boosts are normal 32 bit floats.

-Yonik
http://www.lucidimagination.com

Re: Is negative boost possible?

Posted by Lance Norskog <go...@gmail.com>.
I've been told over and over what Koji said - the convention is that
1.0 is the default center of the boost axis. And the other important
thing to know about boost values is that the dynamic range is about
6-8 bits, so use a range of "2.0 4.0 12.0" instead of "100.0 200.0
1200.0".

Lance

On Sat, Oct 10, 2009 at 9:07 PM, ragi <ra...@gmail.com> wrote:
>
> If you dont want to do a pure negative query and just want boost a few
> documents down based on a matching criteria try to use linear function (one
> of the functions available in boost function) with a negative m (slope).
> We could solve our problem this way.
>
>
> We wanted to do negatively boost some documents based on certain keywords
> while
>
> Marc Sturlese wrote:
>>
>>
>> :>the only way to "negative boost" is to "positively boost" the inverse...
>> :>
>> :>    (*:* -field1:value_to_penalize)^10
>>
>> This will do the job aswell as bq supports pure negative queries (at least
>> in trunk):
>> bq=-field1:value_to_penalize^10
>>
>> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e
>>
>>
>> hossman wrote:
>>>
>>>
>>> : Use decimal figure less than 1, e.g. 0.5, to express less importance.
>>>
>>> but that's stil la positive boost ... it still increases the scores of
>>> documents that match.
>>>
>>> the only way to "negative boost" is to "positively boost" the inverse...
>>>
>>>      (*:* -field1:value_to_penalize)^10
>>>
>>> : > I am looking for a way to assign negative boost to a term in Solr
>>> query.
>>> : > Our use scenario is that we want to boost matching documents that are
>>> : > updated recently and penalize those that have not been updated for a
>>> long
>>> : > time.  There are other terms in the query that would affect the
>>> scores as
>>> : > well.  For example we construct a query similar to this:
>>> : >
>>> : > *:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS
>>> TO *]^5
>>> : > lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3
>>> : >
>>> : > I notice it's not possible to simply use a negative boosting factor
>>> in the
>>> : > query.  Is there any way to achieve such result?
>>> : >
>>> : > Regards,
>>> : > Shi Quan He
>>> : >
>>> : >
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Is-negative-boost-possible--tp25025775p25840621.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Is negative boost possible?

Posted by ragi <ra...@gmail.com>.
If you dont want to do a pure negative query and just want boost a few
documents down based on a matching criteria try to use linear function (one
of the functions available in boost function) with a negative m (slope).
We could solve our problem this way.


We wanted to do negatively boost some documents based on certain keywords
while 

Marc Sturlese wrote:
> 
> 
> :>the only way to "negative boost" is to "positively boost" the inverse...
> :>
> :>	(*:* -field1:value_to_penalize)^10
> 
> This will do the job aswell as bq supports pure negative queries (at least
> in trunk):
> bq=-field1:value_to_penalize^10
> 
> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e
> 
> 
> hossman wrote:
>> 
>> 
>> : Use decimal figure less than 1, e.g. 0.5, to express less importance.
>> 
>> but that's stil la positive boost ... it still increases the scores of 
>> documents that match.
>> 
>> the only way to "negative boost" is to "positively boost" the inverse...
>> 
>> 	(*:* -field1:value_to_penalize)^10
>> 
>> : > I am looking for a way to assign negative boost to a term in Solr
>> query.
>> : > Our use scenario is that we want to boost matching documents that are
>> : > updated recently and penalize those that have not been updated for a
>> long
>> : > time.  There are other terms in the query that would affect the
>> scores as
>> : > well.  For example we construct a query similar to this:
>> : > 
>> : > *:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS
>> TO *]^5
>> : > lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3
>> : > 
>> : > I notice it's not possible to simply use a negative boosting factor
>> in the
>> : > query.  Is there any way to achieve such result?
>> : > 
>> : > Regards,
>> : > Shi Quan He
>> : > 
>> : >   
>> 
>> 
>> 
>> -Hoss
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-negative-boost-possible--tp25025775p25840621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is negative boost possible?

Posted by Marc Sturlese <ma...@gmail.com>.

:>the only way to "negative boost" is to "positively boost" the inverse...
:>
:>	(*:* -field1:value_to_penalize)^10

This will do the job aswell as bq supports pure negative queries (at least
in trunk):
bq=-field1:value_to_penalize^10

http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e


hossman wrote:
> 
> 
> : Use decimal figure less than 1, e.g. 0.5, to express less importance.
> 
> but that's stil la positive boost ... it still increases the scores of 
> documents that match.
> 
> the only way to "negative boost" is to "positively boost" the inverse...
> 
> 	(*:* -field1:value_to_penalize)^10
> 
> : > I am looking for a way to assign negative boost to a term in Solr
> query.
> : > Our use scenario is that we want to boost matching documents that are
> : > updated recently and penalize those that have not been updated for a
> long
> : > time.  There are other terms in the query that would affect the scores
> as
> : > well.  For example we construct a query similar to this:
> : > 
> : > *:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO
> *]^5
> : > lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3
> : > 
> : > I notice it's not possible to simply use a negative boosting factor in
> the
> : > query.  Is there any way to achieve such result?
> : > 
> : > Regards,
> : > Shi Quan He
> : > 
> : >   
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-negative-boost-possible--tp25025775p25039059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is negative boost possible?

Posted by Chris Hostetter <ho...@fucit.org>.
: Use decimal figure less than 1, e.g. 0.5, to express less importance.

but that's stil la positive boost ... it still increases the scores of 
documents that match.

the only way to "negative boost" is to "positively boost" the inverse...

	(*:* -field1:value_to_penalize)^10

: > I am looking for a way to assign negative boost to a term in Solr query.
: > Our use scenario is that we want to boost matching documents that are
: > updated recently and penalize those that have not been updated for a long
: > time.  There are other terms in the query that would affect the scores as
: > well.  For example we construct a query similar to this:
: > 
: > *:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO *]^5
: > lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3
: > 
: > I notice it's not possible to simply use a negative boosting factor in the
: > query.  Is there any way to achieve such result?
: > 
: > Regards,
: > Shi Quan He
: > 
: >   



-Hoss


Re: Is negative boost possible?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Hi,

Use decimal figure less than 1, e.g. 0.5, to express less importance.

Koji

Larry He wrote:
> Hi all,
>
> I am looking for a way to assign negative boost to a term in Solr query.
> Our use scenario is that we want to boost matching documents that are
> updated recently and penalize those that have not been updated for a long
> time.  There are other terms in the query that would affect the scores as
> well.  For example we construct a query similar to this:
>
> *:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO *]^5
> lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3
>
> I notice it's not possible to simply use a negative boosting factor in the
> query.  Is there any way to achieve such result?
>
> Regards,
> Shi Quan He
>
>