You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Doug Cutting <cu...@lucene.com> on 2002/07/29 21:31:49 UTC

document & field boosting

FYI, I just added document and field boosting to Lucene.  It should be 
in tonight's nightly build.

This lets one, e.g., implement Google-like ranking, where a factor in a 
document's score is determined independently from the text of the document.

Longer term, I'd still like to open up document scoring, so that a user 
can alter any part of the formula without altering Lucene's core code.

Enjoy!

Doug

-------- Original Message --------
Subject: Re: cvs commit: 
jakarta-lucene/src/test/org/apache/lucene/search TestDocBoost.java
Date: Mon, 29 Jul 2002 12:14:22 -0700
From: Doug Cutting <cu...@lucene.com>
Reply-To: "Lucene Developers List" <lu...@jakarta.apache.org>
To: Lucene Developers List <lu...@jakarta.apache.org>
References: <20...@icarus.apache.org>

cutting@apache.org wrote:
 >   Log:
 >   msg.txt

Oops.  That log entry was supposed to read:

    Added support for boosting the score of documents and fields via the
    new methods Document.setBoost(float) and Field.setBoost(float).

    Note: This changes the encoding of an indexed value.  Indexes should
    be re-created from scratch in order for search scores to be correct.
    With the new code and an old index, searches will yield very large
    scores for shorter fields, and very small scores for longer fields.
    Once the index is re-created, scores will be as before.

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: document & field boosting

Posted by Doug Cutting <cu...@lucene.com>.
Clemens Marschner wrote:
> Doug, do you think the ranking function as stated in the FAQ
> (http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searc
> h&toc=faq#q31 is still correct after the recent changes?

Yes, this equation is still correct, although it's now incomplete. 
There is now another factor, the boost of the field containing the term, 
specified when that field was indexed.

As I mentioned before, I would eventually like to make it possible for 
folks to easily modify the scoring function.  My idea is to generalize 
the formula to something like:

   sum_t( term_factor(df) * term_doc_factor(tf) * field_factor(length) *
          query_boost * field_boost )

where term_factor(), term_doc_factor() and doc_factor() correspond to 
methods that folks can easily override.

Currently all of the scoring functions are static methods in a single 
class, Similarity.java, so one can in fact currently modify scoring by 
re-defining this class, but it is not well documented and only for the 
brave.

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: document & field boosting

Posted by Clemens Marschner <cm...@lanlab.de>.
Hi,

Doug, do you think the ranking function as stated in the FAQ
(http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searc
h&toc=faq#q31 is still correct after the recent changes?


Clemens


----- Original Message -----
From: "Doug Cutting" <cu...@lucene.com>
To: <ti...@ecliptictech.com>; <so...@business.com>;
<sc...@evendi.de>; <fo...@welho.com>; <me...@yahoo.com>
Cc: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Monday, July 29, 2002 9:31 PM
Subject: document & field boosting


> FYI, I just added document and field boosting to Lucene.  It should be
> in tonight's nightly build.
>
> This lets one, e.g., implement Google-like ranking, where a factor in a
> document's score is determined independently from the text of the
document.
>
> Longer term, I'd still like to open up document scoring, so that a user
> can alter any part of the formula without altering Lucene's core code.
>
> Enjoy!
>
> Doug



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>