You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by goran kent <go...@gmail.com> on 2011/12/01 11:07:47 UTC

[lucy-user] $boost importance in weighting

Hi,

The page at http://incubator.apache.org/lucy/docs/perl/Lucy/Plan/FieldType.html
is a bit sparse on detail about the boost property.

I'd like to get a better understanding of how and by how much it's
value influences score (rank) in search results - what's the formula
used when boost is applied to a document's score?

Finally, what are reasonable values (upper/lower) for boost when, in
my case eg, I'd like to influence the score based on an external value
(page rank), but not have my page rank completely skew the scores -
just enough to promote pages which have an organic page rank value
which should be considered to some degree (a very broad subject, I
know).

My tests so far show that a boost value with a small variance in the
mantissa has an almost zero influence on score/ranking.  My thinking
is to boost with something akin to $boost+=LogN(PR) - ie between 0-10
(log scale).  So this boils down to:  is using a scale of 1-10 a good
idea w.r.t. the Lucy boost property to influence ranking, or 10x that
value?

Any thoughts?

-- 
Regards,
gk

Re: [lucy-user] $boost importance in weighting

Posted by goran kent <go...@gmail.com>.
On Thu, Dec 1, 2011 at 7:35 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> I'd try 1-100.  If that's too much, scale it back.

thanks - it's a dreadfully complicated topic any which way you look at it.



-- 
Regards,
gk

Re: [lucy-user] $boost importance in weighting

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Dec 01, 2011 at 12:07:47PM +0200, goran kent wrote:
> The page at http://incubator.apache.org/lucy/docs/perl/Lucy/Plan/FieldType.html
> is a bit sparse on detail about the boost property.
 
> I'd like to get a better understanding of how and by how much it's
> value influences score (rank) in search results - what's the formula
> used when boost is applied to a document's score?

It's pretty complicated.  Field boost, document boost, and field length
normalization are all consolidated, then they are reduced down to a single
8-bit float with a 3-bit mantissa and a 5-bit exponent.  Because of the
coarseness of the lossy data compression, small changes to boost may not even
move the needle.

I wouldn't bother with a field or document boost multiplier that doesn't
change things by at least a factor of 2.

It's theoretically possible to calculate ceiling and floor values for boost,
but I don't know what the answers are.
 
> Finally, what are reasonable values (upper/lower) for boost when, in
> my case eg, I'd like to influence the score based on an external value
> (page rank), but not have my page rank completely skew the scores -
> just enough to promote pages which have an organic page rank value
> which should be considered to some degree (a very broad subject, I
> know).
 
Subtle rerankings are problematic because search engines are noisy.  Even the
best ones give you a bunch of junk you don't need.  We don't really care about
fine distinctions, because if you sample a handful of documents with identical
scores, odds are that they are *wildly* divergent in terms of what the user
wants.  We only care about big differences.

> My tests so far show that a boost value with a small variance in the
> mantissa has an almost zero influence on score/ranking.  My thinking
> is to boost with something akin to $boost+=LogN(PR) - ie between 0-10
> (log scale).  So this boils down to:  is using a scale of 1-10 a good
> idea w.r.t. the Lucy boost property to influence ranking, or 10x that
> value?

I'd try 1-100.  If that's too much, scale it back.

Marvin Humphrey