You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by Siva Bandhamravuri <sb...@umich.edu> on 2005/04/07 07:42:37 UTC

getTermFreqVector


Hi all,
  I am desperately trying to get the TermFreqVectors using the IndexReader but
always it gives a null value.
Here is the following snippet of code I am doing.

for(int i = 0; i < reader.numDocs(); ++i){
     TermFreqVector dtfv = reader.getTermFreqVector(i,"url");
     if(dtfv != null){
        out.write("\n Document " + i + " has some content in it");
        ++termDocCount;
     }
}

Is something wrong in the code or is something that I missed during creating the
index. I created the index for an intranet using the following command.

bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log

Please help
thanks
Siva

Re: getTermFreqVector

Posted by Doug Cutting <cu...@nutch.org>.

Siva Bandhamravuri wrote:
> Is something wrong in the code or is something that I missed during creating the
> index. I created the index for an intranet using the following command.
> 
> bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log

Nutch does not create term vector indexes.  This is a Lucene option that 
Nutch does not currently specify.  One could patch the index-basic 
plugin (BasicIndexingFilter.java) to index vectors for various fields if 
indicated in the config file.  For example, one could have a 
indexer.basic.vector.content property that, if true, would cause vectors 
to be created for the content field.

Doug