You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Serkan Oktar <so...@sirius-group.com> on 2004/08/24 12:09:26 UTC

term frequency data of terms of all documents

I want to build a list of terms of all documents and their frequency data. 
It seems the information I need is in "tis" and "tii" files. However I havent't found a way to handle them till now.

How can I get the term frequency data?

Thanks ,
Serkan

Re: term frequency data of terms of all documents

Posted by Bernhard Messer <Be...@intrafind.de>.
Serkan,

it's easier using the IndexReader class to get the information you need. 
If you just need the doc frequency of each term you could use the sample.

IndexReader ir = null;
        try {
            if (!IndexReader.indexExists("tmp/index"))
              return;
            ir = IndexReader.open("/tmp/index");
            TermEnum termEnum = ir.terms();
            while (termEnum.next()) {
              Term t = termEnum.term();
              System.out.println(t.text() + " --> " + ir.docFreq(t));
             
            }
        }
        catch (IOException e) {
            System.out.println(e.toString());
        }
        finally {
            if (ir != null) {
                try {
                    ir.close();
                } catch (IOException e) {
                    System.err.println("IOException, opened IndexReader 
can't be closed: " + e.toString());
                }
            }
        }

hope this helps,
Bernhard

Serkan Oktar wrote:

>I want to build a list of terms of all documents and their frequency data. 
>It seems the information I need is in "tis" and "tii" files. However I havent't found a way to handle them till now.
>
>How can I get the term frequency data?
>
>Thanks ,
>Serkan
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org