You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Zhang, Lisheng" <Li...@BroadVision.com> on 2011/10/28 22:57:59 UTC

data corruption in lucene index 2.3.2

 
We are using lucene 2.3.2 (yes we should upgrade) and recently we had Exception when opening
index:
 
###
java.io.IOException: read past EOF<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
        at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:66)
        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:207)
        at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:68)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
        at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
###
 
Looking at the data I found segment files having zero bytes:
 
###
-rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments_8tbb
-rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments.gen
###
 
How could that happen (I did not find any multiple indexing on this data folder)?
 
Thanks very much for helps, Lisheng 
 

RE: data corruption in lucene index 2.3.2

Posted by "Zhang, Lisheng" <Li...@BroadVision.com>.
Hi Mike,

Yes, we had OS crash but it happened minutes after we commited/closed 
IndexWriter according to our log file. We will consider lucene upgrade.

Thanks very much for helps, Lisheng

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Saturday, October 29, 2011 10:55 AM
To: java-user@lucene.apache.org
Subject: Re: data corruption in lucene index 2.3.2


That's fine -- the reader is read-only and won't corrupt the index if
the machine/OS crashes while it's open.

Oh, actually: Lucene 2.3.x did not properly fsync files when you
closed the IndexWriter (this was fixed in 2.4.0).  This means even if
you close the writer and a crash occurs the index could become
corrupt.

Did you have an OS/machine crash on this index?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Oct 29, 2011 at 12:15 PM, Zhang, Lisheng
<Li...@broadvision.com> wrote:
> Hi Mike,
>
> Thanks very much for helps, if indexer was closed but IndexSearcher
> (therefore IndexReader) was open when there was OS crash, could that
> cause segment data corruption (I guess it should not)?
>
> Best regards, Lisheng
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, October 29, 2011 3:39 AM
> To: java-user@lucene.apache.org
> Subject: Re: data corruption in lucene index 2.3.2
>
>
> Was there any catastrophic event against this index?
>
> EG, power loss on the machine, or OS crash, while the app had and
> IndexWriter open?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Oct 28, 2011 at 4:57 PM, Zhang, Lisheng
> <Li...@broadvision.com> wrote:
>>
>> We are using lucene 2.3.2 (yes we should upgrade) and recently we had Exception when opening
>> index:
>>
>> ###
>> java.io.IOException: read past EOF<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
>>        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
>>        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>>        at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:66)
>>        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:207)
>>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:68)
>>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
>>        at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>>        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
>> ###
>>
>> Looking at the data I found segment files having zero bytes:
>>
>> ###
>> -rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments_8tbb
>> -rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments.gen
>> ###
>>
>> How could that happen (I did not find any multiple indexing on this data folder)?
>>
>> Thanks very much for helps, Lisheng
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: data corruption in lucene index 2.3.2

Posted by Michael McCandless <lu...@mikemccandless.com>.
That's fine -- the reader is read-only and won't corrupt the index if
the machine/OS crashes while it's open.

Oh, actually: Lucene 2.3.x did not properly fsync files when you
closed the IndexWriter (this was fixed in 2.4.0).  This means even if
you close the writer and a crash occurs the index could become
corrupt.

Did you have an OS/machine crash on this index?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Oct 29, 2011 at 12:15 PM, Zhang, Lisheng
<Li...@broadvision.com> wrote:
> Hi Mike,
>
> Thanks very much for helps, if indexer was closed but IndexSearcher
> (therefore IndexReader) was open when there was OS crash, could that
> cause segment data corruption (I guess it should not)?
>
> Best regards, Lisheng
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, October 29, 2011 3:39 AM
> To: java-user@lucene.apache.org
> Subject: Re: data corruption in lucene index 2.3.2
>
>
> Was there any catastrophic event against this index?
>
> EG, power loss on the machine, or OS crash, while the app had and
> IndexWriter open?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Oct 28, 2011 at 4:57 PM, Zhang, Lisheng
> <Li...@broadvision.com> wrote:
>>
>> We are using lucene 2.3.2 (yes we should upgrade) and recently we had Exception when opening
>> index:
>>
>> ###
>> java.io.IOException: read past EOF<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
>>        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
>>        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>>        at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:66)
>>        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:207)
>>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:68)
>>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
>>        at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>>        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
>> ###
>>
>> Looking at the data I found segment files having zero bytes:
>>
>> ###
>> -rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments_8tbb
>> -rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments.gen
>> ###
>>
>> How could that happen (I did not find any multiple indexing on this data folder)?
>>
>> Thanks very much for helps, Lisheng
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: data corruption in lucene index 2.3.2

Posted by "Zhang, Lisheng" <Li...@BroadVision.com>.
Hi Mike, 

Thanks very much for helps, if indexer was closed but IndexSearcher
(therefore IndexReader) was open when there was OS crash, could that
cause segment data corruption (I guess it should not)?

Best regards, Lisheng

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Saturday, October 29, 2011 3:39 AM
To: java-user@lucene.apache.org
Subject: Re: data corruption in lucene index 2.3.2


Was there any catastrophic event against this index?

EG, power loss on the machine, or OS crash, while the app had and
IndexWriter open?

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 28, 2011 at 4:57 PM, Zhang, Lisheng
<Li...@broadvision.com> wrote:
>
> We are using lucene 2.3.2 (yes we should upgrade) and recently we had Exception when opening
> index:
>
> ###
> java.io.IOException: read past EOF<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
>        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
>        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>        at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:66)
>        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:207)
>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:68)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
>        at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
> ###
>
> Looking at the data I found segment files having zero bytes:
>
> ###
> -rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments_8tbb
> -rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments.gen
> ###
>
> How could that happen (I did not find any multiple indexing on this data folder)?
>
> Thanks very much for helps, Lisheng
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: data corruption in lucene index 2.3.2

Posted by Michael McCandless <lu...@mikemccandless.com>.
Was there any catastrophic event against this index?

EG, power loss on the machine, or OS crash, while the app had and
IndexWriter open?

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 28, 2011 at 4:57 PM, Zhang, Lisheng
<Li...@broadvision.com> wrote:
>
> We are using lucene 2.3.2 (yes we should upgrade) and recently we had Exception when opening
> index:
>
> ###
> java.io.IOException: read past EOF<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
>        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
>        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>        at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:66)
>        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:207)
>        at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:68)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
>        at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
> ###
>
> Looking at the data I found segment files having zero bytes:
>
> ###
> -rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments_8tbb
> -rwxrwxrwx 1 root root        0 2011-10-29 01:35 segments.gen
> ###
>
> How could that happen (I did not find any multiple indexing on this data folder)?
>
> Thanks very much for helps, Lisheng
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org