You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Matt Galloway <ma...@thebasement.com> on 2004/07/28 22:51:02 UTC
TermFreqVector Beginner Question
Howdy,
I am new to Lucene and thus far I am very impressed. Thanks to all who have
worked on this project!
I am working on a project where I want to do the following:
1.) Index a bunch of document.
2.) Pluck out one of the doucments by Lucene document number
3.) Get a term frequency for that document
After some digging and playing I came across this method...
IndexReader.getTermFreqVector(int docNumber, String field)
This is exactly what I want. So I ran the IndexFiles demo program with some
test documents and started poking at the index with an IndexReader. But when I
called
IndexReader.getTermFreqVector(someDocNumber,"contents")
I get NULL back. After a little more digging I find that for a TermVector to
exist the Field has to have the TermVector flag set. So I changes some lines
in the demo FileDocument.Document method to:
FileInputStream is = new FileInputStream(f);
Reader reader = new BufferedReader(new InputStreamReader(is));
doc.add(Field.Text("contents", reader.toString(),true));
with the "true" parameter causing the new Field to turn on the storeTermVector
flag, right? So then I reindex and get the same results - getTermFreqVector
returns NULL. So I inspect the field list of the Document from the index:
Document d = ir.document(td.doc());
System.out.println(" Path: "+d.get("path"));
for (Enumeration e = d.fields() ; e.hasMoreElements() ;)
{
System.out.println(((Field)e.nextElement()).toString());
}
and I discover that there is now NO "contents" Field. If I change the paramter
in Field.Text to false, I get a "contents" Field but no TermVector. To date I
haven't been able to figure out how to get a TermFreqVector at all.
What am I missing?
I have looked at the documents - all the tutorials I have found just cover the
basics.
I have read the news group postings related to "TermVectors" and
"TermFreqVectors" and everybody says stuff like "the new 1.4 Vector stuff is
great". So how do they know? Where can I learn about this? Are there any more
complete user tutorials/references that cover TermVector features?
Oh, I am using the 1.4 Lucene release in case it matters.
Thanks in advance,
Matt Galloway
Tulsa, Oklahoma
(BTW, I also tired Field.UnStored with the same results.)
-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
----- End forwarded message -----
-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org