You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Klaus <kl...@vommond.de> on 2006/01/11 15:35:24 UTC

RF and IDF

Hi all,

do you know how the tf und idf values are computed by the default
similarity? I mean the exact mathematical equation.

Thx,

Klaus




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: RF and IDF

Posted by Yonik Seeley <ys...@gmail.com>.
Click on "Source Repository" off of the main Lucene page.

Here is a pointer to the search package containing TermQuery/Weight/Scorer
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/search/?sortby=file#dirlist

Look in TermQuert for TermWeight (it's an inner class).

-Yonik

On 1/11/06, Klaus <kl...@vommond.de> wrote:
> Thx, but where can I find this classes?
>
> >If you really want to understand how scoring works, I'd suggest also
> >looking at TermWeight/TermScorer.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: RF and IDF

Posted by Klaus <kl...@vommond.de>.
Thx, but where can I find this classes?

>If you really want to understand how scoring works, I'd suggest also
>looking at TermWeight/TermScorer.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: RF and IDF

Posted by Yonik Seeley <ys...@gmail.com>.
On 1/11/06, Klaus <kl...@vommond.de> wrote:
> Hi all,
>
> do you know how the tf und idf values are computed by the default
> similarity? I mean the exact mathematical equation.

Well, here is the default Similarity:

/** Expert: Default scoring implementation. */
public class DefaultSimilarity extends Similarity {
  /** Implemented as <code>1/sqrt(numTerms)</code>. */
  public float lengthNorm(String fieldName, int numTerms) {
    return (float)(1.0 / Math.sqrt(numTerms));
  }

  /** Implemented as <code>1/sqrt(sumOfSquaredWeights)</code>. */
  public float queryNorm(float sumOfSquaredWeights) {
    return (float)(1.0 / Math.sqrt(sumOfSquaredWeights));
  }

  /** Implemented as <code>sqrt(freq)</code>. */
  public float tf(float freq) {
    return (float)Math.sqrt(freq);
  }

  /** Implemented as <code>1 / (distance + 1)</code>. */
  public float sloppyFreq(int distance) {
    return 1.0f / (distance + 1);
  }

  /** Implemented as <code>log(numDocs/(docFreq+1)) + 1</code>. */
  public float idf(int docFreq, int numDocs) {
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
  }

  /** Implemented as <code>overlap / maxOverlap</code>. */
  public float coord(int overlap, int maxOverlap) {
    return overlap / (float)maxOverlap;
  }
}


If you really want to understand how scoring works, I'd suggest also
looking at TermWeight/TermScorer.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boolean Query

Posted by Chris Hostetter <ho...@fucit.org>.
: BooleanQuery query = new BooleanQuery();
: for(Term t: terms)
: {
: 	query = new TermQuery(t);
: 	query.add(t, false, false); // ist his wrong?
: }
:
: If I construct the query as a string like "A a OR B b OR C" I get much more
: results. I assume that the Boolean query uses an AND operator. How can I
: change that.

The "false, false" on when you add the subclauses should be doing the "OR"
behavior, but more then likely the problem you are running into has to do
with the analyzer being used by your QueryParser when it parses your
string -- when you build the query up by hand, no analyzer is used, so if
the analyzer used at indexing time did any lowercasing or steming you'll
miss a lot of matches.

a quick thing you should try is comparing the toString from each of the
queries you are comparing (the one QueryParser built, and the one you
built by hand).  You should also look at this wiki entry, and pick up a
copy of Lucene in Action and read chapter 4.

: And I'm wondering what happens if I boost a TermQuery with a value smaller
: then one. I'm asking because I would like to boost each TermQuery with the
: td*idf Value of the term in the original document. From my point of view,
: this should lead to a better precision, but on the first looks the results
: are worse.

Before you try this, make sure you understand the existing score
claculation ... look a the explain info for each document against your
query and see what it's already doing.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Boolean Query

Posted by Klaus <kl...@vommond.de>.
Hi,

I have got another question... How do I construct a BooleanQuery, where the
terms with the query a connected with OR? 

I have a list of term, representing to high scored terms in a document. Here
is my code

BooleanQuery query = new BooleanQuery();
for(Term t: terms)
{
	query = new TermQuery(t);
	query.add(t, false, false); // ist his wrong?	  
}

If I construct the query as a string like "A a OR B b OR C" I get much more
results. I assume that the Boolean query uses an AND operator. How can I
change that. 

And I'm wondering what happens if I boost a TermQuery with a value smaller
then one. I'm asking because I would like to boost each TermQuery with the
td*idf Value of the term in the original document. From my point of view,
this should lead to a better precision, but on the first looks the results
are worse.

THX,

Klaus



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org