You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Serkan Oktar <so...@sirius-group.com> on 2004/08/24 12:09:26 UTC
term frequency data of terms of all documents
I want to build a list of terms of all documents and their frequency data.
It seems the information I need is in "tis" and "tii" files. However I havent't found a way to handle them till now.
How can I get the term frequency data?
Thanks ,
Serkan
Re: term frequency data of terms of all documents
Posted by Bernhard Messer <Be...@intrafind.de>.
Serkan,
it's easier using the IndexReader class to get the information you need.
If you just need the doc frequency of each term you could use the sample.
IndexReader ir = null;
try {
if (!IndexReader.indexExists("tmp/index"))
return;
ir = IndexReader.open("/tmp/index");
TermEnum termEnum = ir.terms();
while (termEnum.next()) {
Term t = termEnum.term();
System.out.println(t.text() + " --> " + ir.docFreq(t));
}
}
catch (IOException e) {
System.out.println(e.toString());
}
finally {
if (ir != null) {
try {
ir.close();
} catch (IOException e) {
System.err.println("IOException, opened IndexReader
can't be closed: " + e.toString());
}
}
}
hope this helps,
Bernhard
Serkan Oktar wrote:
>I want to build a list of terms of all documents and their frequency data.
>It seems the information I need is in "tis" and "tii" files. However I havent't found a way to handle them till now.
>
>How can I get the term frequency data?
>
>Thanks ,
>Serkan
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org