You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Filip Radlinski <fr...@gmail.com> on 2005/09/13 00:29:41 UTC
Crash when calling new NutchSearcher using nutch-0.7
Hi,
I'm trying to talk to Nutch from java. I first run a nutch crawl:
./nutch-0.7/bin/nutch crawl root_urls.dat -dir myIndex -depth 8 >& crawl.log
This creates an index in ~/myIndex with subdirectories ~/myIndex/db,
~/myIndex/index and ~/myIndex/segments.
Then I want to create a NutchSearcher, calling
Searcher s = new NutchSearcher("~/myIndex/index");
This crashes with the following stack trace:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.get(ArrayList.java:358)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:151)
at org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java
:149)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115)
at org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java
:86)
at org.apache.lucene.index.TermInfosReader.<init>(TermInfosReader.java:45)
at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:112)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:89)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
at net.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:56)
at NutchDemo.main(NutchDemo.java:348)
The out of bounds exception is a result of a problem in Lucene (version
1.4.3) SegmentTermEnum.java, in readTerm() (line 141). After initially
opening the index, reading the correct version number (-2) and a realistic
index size (3549), readTerm on it's first call reads a start of 0 and length
of 0. Then index.readChars(buffer, start, length) reads 0 bytes (since
length is 0), and the input.readVInt() call on the next line returns -1.
This is passed to fieldInfos.fieldName, which crashes.
The index files look correct (since they are generated by Nutch), and I have
no idea what else I can try to fix the problem. Any suggestions would be
greatly appreciated. The "_0.tis" file is being read correctly, the crash is
on reading "_0.tii".
Am I calling NutchSearcher() correctly? I also tried "new
NutchSearcher("~/myIndex")" but then get "file not found" exception:
Exception in thread "main" java.io.FileNotFoundException:
/usr/u/filip/myIndex/segments (Is a directory)
Any other suggestions?
Thanks in advance,
Filip