You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rehan Abdulaziz <ab...@gmail.com> on 2009/02/01 18:06:42 UTC
tfIdf weights
Hi,
Is it possible to retrieve the tfidf weights (or relative weight calculated
from any formula) instead of simple term frequencies through
getTermFreqVector()? I am more interested in knowing the relative weight of
each term in each field rather than just the frequency of terms.
Thank you very much
--
Rehan Abdul Aziz
Software Engineer
Scrybe Inc. (www.iscrybe.com)
Cell: +92-344-5514583
Re: tfIdf weights
Posted by dipesh <di...@gmail.com>.
hi,
i used lucene-2.4.0 to get tf-idf. i'm not sure if the newer versions have
direct methods to get tf-idfs as well.
this is lengthy but might help.
// Get Term Enum that contains all the terms in the index using
FilterIndexReader
TermEnum e = freader.terms();
// find total number of docs
int noOfDocs = freader.numDocs();
// Get TermDocs Enum
TermDocs td = freader.termDocs();
// seeking through all the terms
while(e.next()){
// get the term
Term t = e.term();
// only search the contents field
Term term = new Term("contents",t.text());
// Move to the <document, frequency> pairs for term from the TermDocs
Enum containing the term t
td.seek(term);
// loop through each document containing the term.
while(td.next()) {
double weight = td.freq()*Math.log(noOfDocs/e.docFreq());
// do something with the weight
//......
}
}
regards,
Dipesh
On Mon, Feb 2, 2009 at 2:06 AM, Rehan Abdulaziz <ab...@gmail.com>wrote:
> Hi,
> Is it possible to retrieve the tfidf weights (or relative weight calculated
> from any formula) instead of simple term frequencies through
> getTermFreqVector()? I am more interested in knowing the relative weight of
> each term in each field rather than just the frequency of terms.
>
> Thank you very much
>
> --
> Rehan Abdul Aziz
> Software Engineer
> Scrybe Inc. (www.iscrybe.com)
> Cell: +92-344-5514583
>
--
----------------------------------------
"Help Ever Hurt Never"- Baba