You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Matt Galloway <ma...@thebasement.com> on 2004/07/28 22:51:02 UTC

TermFreqVector Beginner Question

Howdy,

I am new to Lucene and thus far I am very impressed.  Thanks to all who have
worked on this project!

I am working on a project where I want to do the following:

1.) Index a bunch of document.
2.) Pluck out one of the doucments by Lucene document number
3.) Get a term frequency for that document

After some digging and playing I came across this method...

   IndexReader.getTermFreqVector(int docNumber, String field)

This is exactly what I want.  So I ran the IndexFiles demo program with some
test documents and started poking at the index with an IndexReader. But when I
called

   IndexReader.getTermFreqVector(someDocNumber,"contents")

I get NULL back.  After a little more digging I find that for a TermVector to
exist the Field has to have the TermVector flag set.  So I changes some lines
in the demo FileDocument.Document method to:

    FileInputStream is = new FileInputStream(f);
    Reader reader = new BufferedReader(new InputStreamReader(is));
    doc.add(Field.Text("contents", reader.toString(),true));

with the "true" parameter causing the new Field to turn on the storeTermVector
flag, right? So then I reindex and get the same results - getTermFreqVector
returns NULL.  So I inspect the field list of the Document from the index:

   Document d = ir.document(td.doc());
   System.out.println("  Path: "+d.get("path"));
   for (Enumeration e = d.fields() ; e.hasMoreElements() ;) 
   {
      System.out.println(((Field)e.nextElement()).toString());
   }

and I discover that there is now NO "contents" Field.  If I change the paramter
in Field.Text to false, I get a "contents" Field but no TermVector.  To date I
haven't been able to figure out how to get a TermFreqVector at all.

What am I missing?

I have looked at the documents - all the tutorials I have found just cover the
basics.

I have read the news group postings related to "TermVectors" and
"TermFreqVectors" and everybody says stuff like "the new 1.4 Vector stuff is
great".  So how do they know?  Where can I learn about this?  Are there any more
complete user tutorials/references that cover TermVector features?

Oh, I am using the 1.4 Lucene release in case it matters.

Thanks in advance,

Matt Galloway
Tulsa, Oklahoma


(BTW, I also tired Field.UnStored with the same results.)



-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/

----- End forwarded message -----




-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org