You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Siva Bandhamravuri <sb...@umich.edu> on 2005/04/07 07:42:37 UTC
getTermFreqVector
Hi all,
I am desperately trying to get the TermFreqVectors using the IndexReader but
always it gives a null value.
Here is the following snippet of code I am doing.
for(int i = 0; i < reader.numDocs(); ++i){
TermFreqVector dtfv = reader.getTermFreqVector(i,"url");
if(dtfv != null){
out.write("\n Document " + i + " has some content in it");
++termDocCount;
}
}
Is something wrong in the code or is something that I missed during creating the
index. I created the index for an intranet using the following command.
bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
Please help
thanks
Siva
Re: getTermFreqVector
Posted by Doug Cutting <cu...@nutch.org>.
Siva Bandhamravuri wrote:
> Is something wrong in the code or is something that I missed during creating the
> index. I created the index for an intranet using the following command.
>
> bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
Nutch does not create term vector indexes. This is a Lucene option that
Nutch does not currently specify. One could patch the index-basic
plugin (BasicIndexingFilter.java) to index vectors for various fields if
indicated in the config file. For example, one could have a
indexer.basic.vector.content property that, if true, would cause vectors
to be created for the content field.
Doug