You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rajiv Roopan <ra...@gmail.com> on 2006/07/07 04:46:52 UTC

Lucene search formula

Hello,
   I was recently looking thru the lucene in action book and came across the
scoring formula. I was wondering if the formula has changed since the book
was written?

  Also was wondering if someone can breifly explain what the IDF(t) term in
the formula means? In the book it says that it's the inverse document
frequency of the term but doesn't explain beyond that?

thanks,
rajiv

Re: Lucene search formula

Posted by "Aleksander M. Stensby" <al...@integrasco.no>.
I have written a paper about Topic Detection and Tracking, where I also  
explain the TF-IDF-scheme. If you like, i can send you the paper.

Aleksander

On Fri, 07 Jul 2006 04:46:52 +0200, Rajiv Roopan <ra...@gmail.com>  
wrote:

> Hello,
>    I was recently looking thru the lucene in action book and came across  
> the
> scoring formula. I was wondering if the formula has changed since the  
> book
> was written?
>
>   Also was wondering if someone can breifly explain what the IDF(t) term  
> in
> the formula means? In the book it says that it's the inverse document
> frequency of the term but doesn't explain beyond that?
>
> thanks,
> rajiv



-- 
Aleksander M. Stensby
Software Developer
Integrasco A/S
aleksander.stensby@integrasco.no
Tlf.: +47 41 22 82 72

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene search formula

Posted by Chris Hostetter <ho...@fucit.org>.
:    I was recently looking thru the lucene in action book and came across the
: scoring formula. I was wondering if the formula has changed since the book
: was written?

no, but the book has some mistakes, and the scoring formula is one of
them...
http://lucenebook.com/blog/errata/
http://lucenebook.com/blog/errata/2005/01/24/scoring_formula_omission.html

:   Also was wondering if someone can breifly explain what the IDF(t) term in
: the formula means? In the book it says that it's the inverse document
: frequency of the term but doesn't explain beyond that?

1) google is your friend.
2) it's pretty much exactly what it sounds like ... it's the inverse of
the document frequency for that term .. the more frequent the term is,
the more documents it appears in, the lower the value is.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene search formula

Posted by zheng <zh...@dcs.bbk.ac.uk>.
Hi,

Can somebody explain the lengthNorm, queryNorm and coord in lucene?
lengthNorm is the (term freq)/(total terms number) or (term freq)/(max term
freq) or something else. queryNorm is the (term squared
weight)/(sumOfSqureWeights)? Why we still need queryNorm when it will not
affect the score for a certain query? How to calculate the coord value?
Thanks.

ZZ

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: 07 July 2006 06:10
To: java-user@lucene.apache.org
Subject: Re: Lucene search formula

The formula hasn't changed (but the first printing of the book had a portion
of it missing, check javadoc for (Default?)Similarity for the real and
current formula).

Here is a simple IDF example, or at least how I "visualize" IDF.
You have an index with a bunch of documents and terms in it.  A term T can
appear some number of times in this index, say N times.  You can think of
the IDF of the term T is "1/N" (not really 1/N, but....
log(numDocs/(docFreq+1)) + 1).  The more frequent the term in the index, the
smaller its weight (the less important it is during scoring).

Otis

----- Original Message ----
From: Rajiv Roopan <ra...@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, July 6, 2006 10:46:52 PM
Subject: Lucene search formula

Hello,
   I was recently looking thru the lucene in action book and came across the
scoring formula. I was wondering if the formula has changed since the book
was written?

  Also was wondering if someone can breifly explain what the IDF(t) term in
the formula means? In the book it says that it's the inverse document
frequency of the term but doesn't explain beyond that?

thanks,
rajiv




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene search formula

Posted by Otis Gospodnetic <ot...@yahoo.com>.
The formula hasn't changed (but the first printing of the book had a portion of it missing, check javadoc for (Default?)Similarity for the real and current formula).

Here is a simple IDF example, or at least how I "visualize" IDF.
You have an index with a bunch of documents and terms in it.  A term T can appear some number of times in this index, say N times.  You can think of the IDF of the term T is "1/N" (not really 1/N, but....  log(numDocs/(docFreq+1)) + 1).  The more frequent the term in the index, the smaller its weight (the less important it is during scoring).

Otis

----- Original Message ----
From: Rajiv Roopan <ra...@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, July 6, 2006 10:46:52 PM
Subject: Lucene search formula

Hello,
   I was recently looking thru the lucene in action book and came across the
scoring formula. I was wondering if the formula has changed since the book
was written?

  Also was wondering if someone can breifly explain what the IDF(t) term in
the formula means? In the book it says that it's the inverse document
frequency of the term but doesn't explain beyond that?

thanks,
rajiv




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org