You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sethu_424 <se...@gmail.com> on 2010/05/26 16:15:31 UTC

Re: Getting DF & IDF

Hi,
I am not sure if you are still searching the answer for your question. If
so, then please read on...

You can get the DF & IDF for each of the query terms in the query as below..

IndexReader reader = IndexReader.open(FSDirectory.open(new File(indexDir)),
true); 
	
//Create a FilterIndexReader to invoke the abstract methods
FilterIndexReader filterIndexReader = new FilterIndexReader(reader);
	
//Number of documents in the index
int numDocs = filterIndexReader.numDocs();
	
//Iterate over each of the query words
for(String queryWord : queryWords){
   Term term = new Term(searchField, queryWord.toLowerCase());
    
   int docFreq = 0;
	try {
              docFreq = filterIndexReader.docFreq(term);
	} catch (IOException e) {
               logger.log(Level.SEVERE, null, e);
	}
					   
	//Calculate IDF
	double idf = 0.0;
	if(docFreq > 0){
	      idf = Math.log10((double) numDocs / docFreq);
	}
	 
   System.out.println(queryWord + "\tDF -" + docFreq + "\tIDF -" + idf);
}

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Getting-DF-IDF-tp547386p844962.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Getting DF & IDF

Posted by "yura.minsk" <yu...@gmail.com>.
int numDocs = filterIndexReader.numDocs(); 
...
 idf = Math.log10((double) numDocs / docFreq); 
Sethu_424 wrote
> 
> 
wrong formula. numDoc should not be a count of documents in index - but
documents containing searching term.
We need something like IndexReader.docFreq( term );

--
View this message in context: http://lucene.472066.n3.nabble.com/Getting-DF-IDF-tp547386p3984938.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org