You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Clemens Marschner <cm...@lanlab.de> on 2002/08/12 15:25:55 UTC

Re: document & field boosting

Hi,

Doug, do you think the ranking function as stated in the FAQ
(http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searc
h&toc=faq#q31 is still correct after the recent changes?


Clemens


----- Original Message -----
From: "Doug Cutting" <cu...@lucene.com>
To: <ti...@ecliptictech.com>; <so...@business.com>;
<sc...@evendi.de>; <fo...@welho.com>; <me...@yahoo.com>
Cc: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Monday, July 29, 2002 9:31 PM
Subject: document & field boosting


> FYI, I just added document and field boosting to Lucene.  It should be
> in tonight's nightly build.
>
> This lets one, e.g., implement Google-like ranking, where a factor in a
> document's score is determined independently from the text of the
document.
>
> Longer term, I'd still like to open up document scoring, so that a user
> can alter any part of the formula without altering Lucene's core code.
>
> Enjoy!
>
> Doug



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: document & field boosting

Posted by Doug Cutting <cu...@lucene.com>.
Clemens Marschner wrote:
> Doug, do you think the ranking function as stated in the FAQ
> (http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searc
> h&toc=faq#q31 is still correct after the recent changes?

Yes, this equation is still correct, although it's now incomplete. 
There is now another factor, the boost of the field containing the term, 
specified when that field was indexed.

As I mentioned before, I would eventually like to make it possible for 
folks to easily modify the scoring function.  My idea is to generalize 
the formula to something like:

   sum_t( term_factor(df) * term_doc_factor(tf) * field_factor(length) *
          query_boost * field_boost )

where term_factor(), term_doc_factor() and doc_factor() correspond to 
methods that folks can easily override.

Currently all of the scoring functions are static methods in a single 
class, Similarity.java, so one can in fact currently modify scoring by 
re-defining this class, but it is not well documented and only for the 
brave.

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>