You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Chris Sibert <ch...@attbi.com> on 2002/09/02 09:11:08 UTC

Scoring

I am disatisfied with the document scores that I'm getting. If a document is short, and has one occurrence of the search term, it is ranked higher than a longer document with two occurrences of the term. This makes little sense to me, and I'd like the longer document with more occurrences to be ranked higher. I figured I have to override the scoring method, but I can't find where Lucene actually does the scoring. This is actually not an uncommon problem for me, as I find perusing the API to be high on the confusing scale, due to the lack of comprehensive Javadoc documentation. (Something that even Sun doesn't spend much time on.) I attempt to read the code, but variable names are terse, and there's a dearth of commenting, which makes it fairly unfathomable. 

This is the code that I'm using. Am I doing the right thing in using the Query object, or should I be using a different one, such as TermQuery ? Does TermQuery score differently, so that I might be happier with it's behavior ? If not, where might I find the method that actually computes the Document's score, so that I may modify it ? 


    Hits     find ( String  string_searchString, String string_indexPath )
    {
        Searcher            indexSearcher    ;
        Analyzer            analyzer    ;
        Query                query       ;
        QueryParser      queryParser ;
        Hits                   searchResults_Hits        ;

        try
        {
            indexSearcher         = new   IndexSearcher ( string_indexPath ) ;
            analyzer                 = new   SimpleAnalyzer ()       ;

            query                      = QueryParser.parse ( string_searchString, "DocumentText", analyzer )    ;
            searchResults_Hits  = indexSearcher.search ( query )     ;
            
            return  searchResults_Hits ;
        }

Re: Scoring

Posted by Alex Murzaku <mu...@yahoo.com>.

Lucene uses a variation of TF-TDF for the similarity score where the
document length is one of the factors. In the future you will be able
to modify scoring to your needs: 
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html

There might be more in the FAQ or in the mail archive. This is
something that has been discussed quite often. For example, you could
learn about the scoring mechanism in:
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq#q31

--- Chris Sibert <ch...@attbi.com> wrote:
> I am disatisfied with the document scores that I'm getting. If a
> document is short, and has one occurrence of the search term, it is
> ranked higher than a longer document with two occurrences of the
> term. This makes little sense to me, and I'd like the longer document
> with more occurrences to be ranked higher. I figured I have to
> override the scoring method, but I can't find where Lucene actually
> does the scoring. This is actually not an uncommon problem for me, as
> I find perusing the API to be high on the confusing scale, due to the
> lack of comprehensive Javadoc documentation. (Something that even Sun
> doesn't spend much time on.) I attempt to read the code, but variable
> names are terse, and there's a dearth of commenting, which makes it
> fairly unfathomable. 

=====
__________________________________
alex@lissus.com -- http://www.lissus.com

__________________________________________________
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes
http://finance.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Scoring

Posted by Otis Gospodnetic <ot...@yahoo.com>.

This?

/** Internal class used for scoring.
 * <p>Public only so that the indexing code can compute and store the
 * normalization byte for each document. */
public final class Similarity {



But you are right, the comments are pretty scarce, and the Javadocs
could be improved.  If you've got time and will, please contribute.

Otis


--- Chris Sibert <ch...@attbi.com> wrote:
> I am disatisfied with the document scores that I'm getting. If a
> document is short, and has one occurrence of the search term, it is
> ranked higher than a longer document with two occurrences of the
> term. This makes little sense to me, and I'd like the longer document
> with more occurrences to be ranked higher. I figured I have to
> override the scoring method, but I can't find where Lucene actually
> does the scoring. This is actually not an uncommon problem for me, as
> I find perusing the API to be high on the confusing scale, due to the
> lack of comprehensive Javadoc documentation. (Something that even Sun
> doesn't spend much time on.) I attempt to read the code, but variable
> names are terse, and there's a dearth of commenting, which makes it
> fairly unfathomable. 
> 
> This is the code that I'm using. Am I doing the right thing in using
> the Query object, or should I be using a different one, such as
> TermQuery ? Does TermQuery score differently, so that I might be
> happier with it's behavior ? If not, where might I find the method
> that actually computes the Document's score, so that I may modify it
> ? 
> 
> 
>     Hits     find ( String  string_searchString, String
> string_indexPath )
>     {
>         Searcher            indexSearcher    ;
>         Analyzer            analyzer    ;
>         Query                query       ;
>         QueryParser      queryParser ;
>         Hits                   searchResults_Hits        ;
> 
>         try
>         {
>             indexSearcher         = new   IndexSearcher (
> string_indexPath ) ;
>             analyzer                 = new   SimpleAnalyzer ()      
> ;
> 
>             query                      = QueryParser.parse (
> string_searchString, "DocumentText", analyzer )    ;
>             searchResults_Hits  = indexSearcher.search ( query )    
> ;
>             
>             return  searchResults_Hits ;
>         }
> 
> 


__________________________________________________
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes
http://finance.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>