You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Filip Radlinski <fr...@gmail.com> on 2005/09/13 00:29:41 UTC

Crash when calling new NutchSearcher using nutch-0.7

Hi,

I'm trying to talk to Nutch from java. I first run a nutch crawl:

./nutch-0.7/bin/nutch crawl root_urls.dat -dir myIndex -depth 8 >& crawl.log

This creates an index in ~/myIndex with subdirectories ~/myIndex/db, 
~/myIndex/index and ~/myIndex/segments.

Then I want to create a NutchSearcher, calling

Searcher s = new NutchSearcher("~/myIndex/index");

This crashes with the following stack trace:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.get(ArrayList.java:358)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:151)
at org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java
:149)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115)
at org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java
:86)
at org.apache.lucene.index.TermInfosReader.<init>(TermInfosReader.java:45)
at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:112)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:89)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
at net.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:56)
at NutchDemo.main(NutchDemo.java:348)

The out of bounds exception is a result of a problem in Lucene (version 
1.4.3) SegmentTermEnum.java, in readTerm() (line 141). After initially 
opening the index, reading the correct version number (-2) and a realistic 
index size (3549), readTerm on it's first call reads a start of 0 and length 
of 0. Then index.readChars(buffer, start, length) reads 0 bytes (since 
length is 0), and the input.readVInt() call on the next line returns -1. 
This is passed to fieldInfos.fieldName, which crashes.

The index files look correct (since they are generated by Nutch), and I have 
no idea what else I can try to fix the problem. Any suggestions would be 
greatly appreciated. The "_0.tis" file is being read correctly, the crash is 
on reading "_0.tii". 

Am I calling NutchSearcher() correctly? I also tried "new 
NutchSearcher("~/myIndex")" but then get "file not found" exception:

Exception in thread "main" java.io.FileNotFoundException: 
/usr/u/filip/myIndex/segments (Is a directory)

Any other suggestions?

Thanks in advance,
Filip