You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by goran kent <go...@gmail.com> on 2011/12/01 11:07:47 UTC
[lucy-user] $boost importance in weighting
Hi,
The page at http://incubator.apache.org/lucy/docs/perl/Lucy/Plan/FieldType.html
is a bit sparse on detail about the boost property.
I'd like to get a better understanding of how and by how much it's
value influences score (rank) in search results - what's the formula
used when boost is applied to a document's score?
Finally, what are reasonable values (upper/lower) for boost when, in
my case eg, I'd like to influence the score based on an external value
(page rank), but not have my page rank completely skew the scores -
just enough to promote pages which have an organic page rank value
which should be considered to some degree (a very broad subject, I
know).
My tests so far show that a boost value with a small variance in the
mantissa has an almost zero influence on score/ranking. My thinking
is to boost with something akin to $boost+=LogN(PR) - ie between 0-10
(log scale). So this boils down to: is using a scale of 1-10 a good
idea w.r.t. the Lucy boost property to influence ranking, or 10x that
value?
Any thoughts?
--
Regards,
gk
Re: [lucy-user] $boost importance in weighting
Posted by goran kent <go...@gmail.com>.
On Thu, Dec 1, 2011 at 7:35 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> I'd try 1-100. If that's too much, scale it back.
thanks - it's a dreadfully complicated topic any which way you look at it.
--
Regards,
gk
Re: [lucy-user] $boost importance in weighting
Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Dec 01, 2011 at 12:07:47PM +0200, goran kent wrote:
> The page at http://incubator.apache.org/lucy/docs/perl/Lucy/Plan/FieldType.html
> is a bit sparse on detail about the boost property.
> I'd like to get a better understanding of how and by how much it's
> value influences score (rank) in search results - what's the formula
> used when boost is applied to a document's score?
It's pretty complicated. Field boost, document boost, and field length
normalization are all consolidated, then they are reduced down to a single
8-bit float with a 3-bit mantissa and a 5-bit exponent. Because of the
coarseness of the lossy data compression, small changes to boost may not even
move the needle.
I wouldn't bother with a field or document boost multiplier that doesn't
change things by at least a factor of 2.
It's theoretically possible to calculate ceiling and floor values for boost,
but I don't know what the answers are.
> Finally, what are reasonable values (upper/lower) for boost when, in
> my case eg, I'd like to influence the score based on an external value
> (page rank), but not have my page rank completely skew the scores -
> just enough to promote pages which have an organic page rank value
> which should be considered to some degree (a very broad subject, I
> know).
Subtle rerankings are problematic because search engines are noisy. Even the
best ones give you a bunch of junk you don't need. We don't really care about
fine distinctions, because if you sample a handful of documents with identical
scores, odds are that they are *wildly* divergent in terms of what the user
wants. We only care about big differences.
> My tests so far show that a boost value with a small variance in the
> mantissa has an almost zero influence on score/ranking. My thinking
> is to boost with something akin to $boost+=LogN(PR) - ie between 0-10
> (log scale). So this boils down to: is using a scale of 1-10 a good
> idea w.r.t. the Lucy boost property to influence ranking, or 10x that
> value?
I'd try 1-100. If that's too much, scale it back.
Marvin Humphrey