You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Stas Chetvertkov <sc...@oilspace.com> on 2002/11/06 10:22:56 UTC

Lucene index got corrupted

Hi All,

We are using lucene for indexing realtime news, and everything was working
fine until now. I have found that one of our lucene indexes is corrupted,
all attempts to search in it / optimize it or merge it to another index
results in 'read past EOF' exception.

My investigation showed that one of segments in index seems invalid. Its
field index file, '_4lvd.fdx', has length equal to 24 bytes, while all field
normalization factor files (_4lvd.f?) are 2 bytes long. Since number of
documents is determined as length('_4lvd.fdx')/8, lucene tries to read 3rd
byte from normalization factor files and fails.

Does anyone have any ideas how this index corruption could occur and how I
can fix it? Any advise would be extremely helpful.

Here is an exception that I get when trying to search in this index:
Exception in thread "main" java.io.IOException: read past EOF
        at org.apache.lucene.store.InputStream.refill(Unknown Source)
        at org.apache.lucene.store.InputStream.readByte(Unknown Source)
        at org.apache.lucene.store.InputStream.readBytes(Unknown Source)
        at org.apache.lucene.index.SegmentReader.norms(Unknown Source)
        at org.apache.lucene.index.SegmentsReader.norms(Unknown Source)
        at org.apache.lucene.search.TermQuery.scorer(Unknown Source)
        at org.apache.lucene.search.Query.scorer(Unknown Source)
        at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
        at org.apache.lucene.search.Hits.getMoreDocs(Unknown Source)
        at org.apache.lucene.search.Hits.<init>(Unknown Source)
        at org.apache.lucene.search.Searcher.search(Unknown Source)
        at org.apache.lucene.search.Searcher.search(Unknown Source)
        at Search.main(Search.java:31)

I am also attaching archived segment that is causing problems.

Regards,
Stas.

Re: Lucene index got corrupted

Posted by Phil W <ph...@volantis.com>.
Stas Chetvertkov <schetvertkov <at> oilspace.com> writes:
> 
> Hi All,
> 
> We are using lucene for indexing realtime news, and everything was working
> fine until now. I have found that one of our lucene indexes is corrupted,
> all attempts to search in it / optimize it or merge it to another index
> results in 'read past EOF' exception.
> 
> My investigation showed that one of segments in index seems invalid. Its
> field index file, '_4lvd.fdx', has length equal to 24 bytes, while all field
> normalization factor files (_4lvd.f?) are 2 bytes long. Since number of
> documents is determined as length('_4lvd.fdx')/8, lucene tries to read 3rd
> byte from normalization factor files and fails.
> 
> Does anyone have any ideas how this index corruption could occur and how I
> can fix it? Any advise would be extremely helpful.

Did you ever get anywhere with this? We seem to be having issues with some
character data causing corruptions in a similar manner (we haven't tracked it
down or verified which characters cause this, though a tab character might be
one of the offendees).



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org