You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Adam Ratcliffe <ad...@prema.co.nz> on 2002/04/14 01:39:07 UTC

XMLDirectory for index storage as XML fails during segment merge

I'm using lucene-1.2-rc4 to index XML elements stored in a native XML 
database. An Index class is used to generate an index for a set of nodes that 
is then persisted to the DOM tree.

Following the pattern of the RAMDirectory class I've created an XMLDirectory 
that has operations for writing index files as elements and others for 
reading index content given an element representing an index file.

This class works fine on test datasets but fails after approx. 3000 documents 
have been indexed.

An EOF error is thrown during the refill() operation of the InputStream when 
the segments are being merged, I've attached the relevant stack-trace below.

[Index.java:78] java.io.IOException: read past EOF
java.io.IOException: read past EOF
	at org.apache.lucene.store.InputStream.refill(Unknown Source)
	at org.apache.lucene.store.InputStream.readByte(Unknown Source)
	at org.apache.lucene.store.InputStream.readChars(Unknown Source)
	at org.apache.lucene.store.InputStream.readString(Unknown Source)
	at org.apache.lucene.index.FieldsReader.doc(Unknown Source)
	at org.apache.lucene.index.SegmentReader.document(Unknown Source)
	at org.apache.lucene.index.SegmentMerger.mergeFields(Unknown Source)
	at org.apache.lucene.index.SegmentMerger.merge(Unknown Source)
	at org.apache.lucene.index.IndexWriter.mergeSegments(Unknown Source)
	at org.apache.lucene.index.IndexWriter.maybeMergeSegments(Unknown Source)
	at org.apache.lucene.index.IndexWriter.addDocument(Unknown Source)
	at com.parochus.search.Index.create(Index.java:72)
	at com.parochus.search.TstIndex.main(TstIndex.java:43)

I've reviewed the Lucene and JGuru FAQs and conclude that Lucene should be 
comfortable with indexing millions of documents within a single index so this 
error wouldn't appear to be due to any upper-limit of Lucene's handling 
capablility being reached.

I'm wondering if this is indicating that the length count for the segment is 
not matching up with the number of bytes actually written for it.  

Anybody got any ideas for tests that I could run that would determine if this 
is the cause of the problem?

Regards,
Adam Ratcliffe
adam@prema.co.nz



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>