You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Gili Nachum <gi...@gmail.com> on 2013/11/05 23:38:26 UTC

Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

Hello,
I got an index corruption in production, and was wondering if it might be a
known bug (still with Lucene 3.1), or is my code doing something wrong.
It's a local disk index. No known machine power lose. No suppose to even
happen, right?

This index that got corrupted is updated every 30sec; adding to it a small
delta's index (using addIndexes()) that was replicated from another machine.
The series of writer actions to update the index is:
1. writer.deleteDocuments(q);
2. writer.flush(false, true);
3. writer.addIndexes(reader);
4. writer.commit(map);

Is the index exposed to corruptions only during commit, or is addIndexes()
risky by itself (doc says it's not).
LUCENE-2610 <https://issues.apache.org/jira/browse/LUCENE-2610> kind of
looks in the neberhood, though it's not a bug report.
I'll add an ls -l output in a follow up email.

Technically the first indication of problems is when calling flush, but it
could be that the previous writer action left it broken for flush to fail.
My stack trace is:
Caused by: java.io.FileNotFoundException:
/disks/data1/opt/WAS/LotusConnections/Data/catalog/index/Places/index/_33gg.cfs
(No such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
    at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:69)
    at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:90)
    at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:91)
    at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
    at
org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:66)
    at
org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:113)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
    at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:684)
    at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:659)
    at
org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:283)
    at
org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:191)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3358)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

Posted by Gili Nachum <gi...@gmail.com>.

Following up with the structure of the index dir, and checkIndex output.

*Structure of index:*
06/11/2013  10:55 AM    <DIR>          .
06/11/2013  10:55 AM    <DIR>          ..
02/11/2013  01:06 AM                20 segments.gen
01/11/2013  03:00 AM             2,589 segments_29tx
02/11/2013  01:06 AM             2,369 segments_2bsy
01/11/2013  03:00 AM         1,615,209 _1qro.fdt
01/11/2013  03:00 AM            21,612 _1qro.fdx
01/11/2013  03:00 AM               421 _1qro.fnm
01/11/2013  03:00 AM           466,357 _1qro.frq
01/11/2013  03:00 AM            10,808 _1qro.nrm
01/11/2013  03:00 AM           654,674 _1qro.prx
01/11/2013  03:00 AM            11,320 _1qro.tii
01/11/2013  03:00 AM           866,215 _1qro.tis
01/11/2013  03:00 AM               346 _1qro_p0.del
01/11/2013  03:00 AM         2,215,825 _24xf.cfs
01/11/2013  03:00 AM               214 _24xf_9k.del
01/11/2013  03:00 AM           993,243 _2czq.cfs
01/11/2013  03:00 AM                94 _2czq_3.del
01/11/2013  03:00 AM         2,688,823 _2d00.cfs
01/11/2013  03:00 AM                15 _2d00_1.del
01/11/2013  03:00 AM         1,310,966 _2iwt.cfs
01/11/2013  03:00 AM               112 _2iwt_1.del
01/11/2013  03:00 AM            79,481 _2iwz.cfs
01/11/2013  03:00 AM         2,501,932 _6k.cfs
01/11/2013  03:00 AM               259 _6k_62.del
01/11/2013  03:00 AM         2,596,920 _6l.cfs
01/11/2013  03:00 AM               259 _6l_hx.del
01/11/2013  03:00 AM         2,049,757 _9n.fdt
01/11/2013  03:00 AM            33,132 _9n.fdx
01/11/2013  03:00 AM               382 _9n.fnm
01/11/2013  03:00 AM           467,759 _9n.frq
01/11/2013  03:00 AM            16,568 _9n.nrm
01/11/2013  03:00 AM           620,251 _9n.prx
01/11/2013  03:00 AM            11,680 _9n.tii
01/11/2013  03:00 AM           878,430 _9n.tis
01/11/2013  03:00 AM               526 _9n_2t.del
01/11/2013  03:00 AM        11,064,537 _9o.fdt
01/11/2013  03:00 AM           159,996 _9o.fdx
01/11/2013  03:00 AM               382 _9o.fnm
01/11/2013  03:00 AM         2,988,122 _9o.frq
01/11/2013  03:00 AM            80,000 _9o.nrm
01/11/2013  03:00 AM         4,118,225 _9o.prx
01/11/2013  03:00 AM            50,218 _9o.tii
01/11/2013  03:00 AM         4,066,962 _9o.tis
01/11/2013  03:00 AM             2,508 _9o_lg.del
01/11/2013  03:00 AM         1,731,476 _ug3.fdt
01/11/2013  03:00 AM            23,596 _ug3.fdx
01/11/2013  03:00 AM               421 _ug3.fnm
01/11/2013  03:00 AM           496,290 _ug3.frq
01/11/2013  03:00 AM            11,800 _ug3.nrm
01/11/2013  03:00 AM           687,953 _ug3.prx
01/11/2013  03:00 AM            12,280 _ug3.tii
01/11/2013  03:00 AM           920,751 _ug3.tis
01/11/2013  03:00 AM               377 _ug3_10g.del
              52 File(s)     46,534,462 bytes

*CheckIndex output:*
status.clean=false
status.numBadSegments=3
status.numSegments=10
status.segmentFormat=FORMAT_3_1 [Lucene 3.1]
status.segmentsFileName=segments_2bsy
status.totLoseDocCount=5459
status.cantOpenSegments=false
status.missingSegments=false
status.missingSegmentVersion=false
status.partial=false
status.segmentInfos=[org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@ccdc6162,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@3cd10a77,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@c5d1f95b,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@e29a4dbc,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@fe7664a1,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@e18ca53e,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@b2901de3,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@3e558803,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@c34ccd0b,
org.apache.lucene.index.CheckIndex$Status$SegmentInfoStatus@f4719e19]
status.segmentsChecked=[]
status.toolOutOfDate=false
status.userData={VERSION=9460}


On Wed, Nov 6, 2013 at 12:38 AM, Gili Nachum <gi...@gmail.com> wrote:

> Hello,
> I got an index corruption in production, and was wondering if it might be
> a known bug (still with Lucene 3.1), or is my code doing something wrong.
> It's a local disk index. No known machine power lose. No suppose to even
> happen, right?
>
> This index that got corrupted is updated every 30sec; adding to it a small
> delta's index (using addIndexes()) that was replicated from another machine.
> The series of writer actions to update the index is:
> 1. writer.deleteDocuments(q);
> 2. writer.flush(false, true);
> 3. writer.addIndexes(reader);
> 4. writer.commit(map);
>
> Is the index exposed to corruptions only during commit, or is addIndexes()
> risky by itself (doc says it's not).
> LUCENE-2610 <https://issues.apache.org/jira/browse/LUCENE-2610> kind of
> looks in the neberhood, though it's not a bug report.
> I'll add an ls -l output in a follow up email.
>
> Technically the first indication of problems is when calling flush, but it
> could be that the previous writer action left it broken for flush to fail.
> My stack trace is:
> Caused by: java.io.FileNotFoundException:
> /disks/data1/opt/WAS/LotusConnections/Data/catalog/index/Places/index/_33gg.cfs
> (No such file or directory)
>     at java.io.RandomAccessFile.open(Native Method)
>     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
>     at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:69)
>     at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:90)
>     at
> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:91)
>     at
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
>     at
> org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:66)
>     at
> org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:113)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
>     at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:684)
>     at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:659)
>     at
> org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:283)
>     at
> org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:191)
>     at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3358)
>     at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)
>

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK, so CheckIndex found that the del files for 3 segments could not be
found, e.g. it wanted to open _24xf_9l.del (yet it's _24xf_9k.del
that's actually there).

I wonder why CheckIndex doesn't report the exc you saw in flush, with
that way-future segment (_33gg.cfs): that's weird.

But ... I suspect you may be hitting
https://issues.apache.org/jira/browse/LUCENE-3418 -- that issues
causes IW.commit() to not actually "work", so that if you commit
successfully and then there's power loss / OS crash, you could lose
files.  But you said there was no known power loss / crash?

It's also odd that you have two very different segments_N files in the index:

01/11/2013  03:00 AM             2,589 segments_29tx
02/11/2013  01:06 AM             2,369 segments_2bsy

And CheckIndex opened the newer one; maybe try temporarily moving that
new one (_2bsy) out of the way and then see if the index is intact
(this is a long shot ... it's really weird that you have that much
older segments still there).

Is there any replication involved here, besides addIndexes?  Ie,
anything that directly copies files into the index?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Nov 7, 2013 at 4:32 AM, Gili Nachum <gi...@gmail.com> wrote:
> Thanks Mike and Uwe.
> I already reindexed in production, my goal is to get to the root cause to
> make sure it doesn't happen again.
> Will remove the flush(). No idea why it's there.
> Attaching checkIndex.Main() output (why did I bother writing my own output
> :#)
>
> *Output:*
> Opening index @ C:\\customers\\SC\\corrupt catalog index E3 -
> Nov\\WDPP29715_ap03\\opt\\WAS\\LotusConnections\\Data\\catalog\\index\\Places\\index
>
> Segments file=segments_2bsy numSegments=10 version=FORMAT_3_1 [Lucene 3.1]
> userData={VERSION=9460}
>   1 of 10: name=_9n docCount=4141
>     compound=false
>     hasProx=true
>     numFiles=9
>     size (MB)=3.89
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, optimize=false,
> os.version=2.6.18-348.12.1.el5}
>     has deletions [delFileName=_9n_2t.del]
>     test: open reader.........OK [209 deleted docs]
>     test: fields..............OK [27 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [82966 terms; 295120 terms/docs pairs;
> 300750 tokens]
>     test: stored fields.......OK [72872 total field count; avg 18.533
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   2 of 10: name=_9o docCount=19999
>     compound=false
>     hasProx=true
>     numFiles=9
>     size (MB)=21.487
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, optimize=false,
> os.version=2.6.18-348.12.1.el5}
>     has deletions [delFileName=_9o_lg.del]
>     test: open reader.........OK [1396 deleted docs]
>     test: fields..............OK [27 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [318090 terms; 1773898 terms/docs pairs;
> 1888318 tokens]
>     test: stored fields.......OK [390466 total field count; avg 20.989
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   3 of 10: name=_6k docCount=2000
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=2.386
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, os.version=2.6.18-348.12.1.el5}
>     has deletions [delFileName=_6k_62.del]
>     test: open reader.........OK [389 deleted docs]
>     test: fields..............OK [27 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [46699 terms; 193013 terms/docs pairs;
> 178450 tokens]
>     test: stored fields.......OK [35965 total field count; avg 22.325
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   4 of 10: name=_6l docCount=2000
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=2.477
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, os.version=2.6.18-348.12.1.el5}
>     has deletions [delFileName=_6l_hx.del]
>     test: open reader.........OK [864 deleted docs]
>     test: fields..............OK [27 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [55730 terms; 196164 terms/docs pairs;
> 117213 tokens]
>     test: stored fields.......OK [23202 total field count; avg 20.424
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   5 of 10: name=_ug3 docCount=2949
>     compound=false
>     hasProx=true
>     numFiles=9
> FAILED
>     WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.io.FileNotFoundException: _ug3_10h.del
>     at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
>     at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
>     at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
>     at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)
>
>   6 of 10: name=_1qro docCount=2701
>     compound=false
>     hasProx=true
>     numFiles=9
>     size (MB)=3.478
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, optimize=false,
> os.version=2.6.18-348.16.1.el5}
>     has deletions [delFileName=_1qro_p0.del]
>     test: open reader.........OK [1473 deleted docs]
>     test: fields..............OK [30 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [76625 terms; 278909 terms/docs pairs;
> 143954 tokens]
>     test: stored fields.......OK [25932 total field count; avg 21.117
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   7 of 10: name=_24xf docCount=1645
>     compound=true
>     hasProx=true
>     numFiles=2
> FAILED
>     WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.io.FileNotFoundException: _24xf_9l.del
>     at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
>     at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
>     at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
>     at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)
>
>   8 of 10: name=_2czq docCount=681
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=0.947
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, optimize=false,
> os.version=2.6.18-348.16.1.el5}
>     has deletions [delFileName=_2czq_3.del]
>     test: open reader.........OK [465 deleted docs]
>     test: fields..............OK [30 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [25569 terms; 72519 terms/docs pairs;
> 21076 tokens]
>     test: stored fields.......OK [4328 total field count; avg 20.037 fields
> per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   9 of 10: name=_2d00 docCount=1997
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=2.564
>     diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
> source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
> 1085809 - 2011-03-26 17:59:57, os.version=2.6.18-371.el5}
>     has deletions [delFileName=_2d00_1.del]
>     test: open reader.........OK [1 deleted docs]
>     test: fields..............OK [31 fields]
>     test: field norms.........OK [4 fields]
>     test: terms, freq, prox...OK [57448 terms; 204959 terms/docs pairs;
> 241754 tokens]
>     test: stored fields.......OK [44545 total field count; avg 22.317
> fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>   10 of 10: name=_2l3x docCount=865
>     compound=true
>     hasProx=true
>     numFiles=1
> FAILED
>     WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.io.FileNotFoundException: _2l3x.cfs
>     at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
>     at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
>     at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
>     at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)
>
> WARNING: 3 broken segments (containing 5459 documents) detected
> WARNING: would write new segments file, and 5459 documents would be lost,
> if -fix were specified
>
>
> On Wed, Nov 6, 2013 at 11:07 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> Hi,
>> > Hello,
>> > I got an index corruption in production, and was wondering if it might
>> be a
>> > known bug (still with Lucene 3.1), or is my code doing something wrong.
>> > It's a local disk index. No known machine power lose. No suppose to even
>> > happen, right?
>> >
>> > This index that got corrupted is updated every 30sec; adding to it a
>> small
>> > delta's index (using addIndexes()) that was replicated from another
>> machine.
>> > The series of writer actions to update the index is:
>> > 1. writer.deleteDocuments(q);
>> > 2. writer.flush(false, true);
>> > 3. writer.addIndexes(reader);
>> > 4. writer.commit(map);
>> >
>> > Is the index exposed to corruptions only during commit, or is
>> addIndexes()
>> > risky by itself (doc says it's not).
>> > LUCENE-2610 <https://issues.apache.org/jira/browse/LUCENE-2610> kind of
>> > looks in the neberhood, though it's not a bug report.
>>
>> Hi, LUCENE-2610 is completely unrelated, as this only affects
>> addIndexes(Directory...), not addIndexes(IndexReader...). The one you are
>> using is using the natural Lucene merging as it is done all the time while
>> indexing documents (Lucene internally uses the same code like
>> addIndexes(IndexReader) to merge segments). addIndexes(Directory) is very
>> different and more risky to have bugs in older Lucene versions (this one
>> copies index files around without touching them, but renaming them to have
>> new segment names - which is somehow "unnatural"; it also does not
>> correctly lock the index directory in older versions).
>>
>> Why do you call flush() at all? I would leave that out, there is no reason
>> to do this from userland code.
>>
>> To "repair" the index, use Checkindex command line tool with the "fix"
>> option. This will delete the segment that is missing (_33gg.cfs). Of course
>> this data is lost, but as the file is not there it is lost already. This
>> will just remove the metadata of this missing segment from your index. But
>> before doing this, you should check what checkindex prints out without fix
>> option - the info you posted is not the console output the tool prints when
>> ran from command line and run with assertions enabled (-ea JVM option). The
>> output looks like the "toString) of the Java API of CheckIndex class, which
>> is not so helpful. Please post the full output of the tool executed from
>> command line:
>>         java -cp lucene-core-3.1.0.jar org.apache.lucene.index.CheckIndex
>> <options....>
>>
>> Uwe
>>
>> > I'll add an ls -l output in a follow up email.
>> >
>> > Technically the first indication of problems is when calling flush, but
>> it could
>> > be that the previous writer action left it broken for flush to fail.
>> > My stack trace is:
>> > Caused by: java.io.FileNotFoundException:
>> > /disks/data1/opt/WAS/LotusConnections/Data/catalog/index/Places/index/
>> > _33gg.cfs
>> > (No such file or directory)
>> >     at java.io.RandomAccessFile.open(Native Method)
>> >     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
>> >     at
>> > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.
>> > <init>(SimpleFSDirectory.java:69)
>> >     at
>> > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(Simp
>> > leFSDirectory.java:90)
>> >     at
>> > org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDire
>> > ctory.java:91)
>> >     at
>> > org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
>> >     at
>> > org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.
>> > java:66)
>> >     at
>> > org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentRead
>> > er.java:113)
>> >     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
>> >     at
>> > org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:684)
>> >     at
>> > org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:659)
>> >     at
>> > org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.jav
>> > a:283)
>> >     at
>> > org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.jav
>> > a:191)
>> >     at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3358)
>> >     at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

Posted by Gili Nachum <gi...@gmail.com>.

Thanks Mike and Uwe.
I already reindexed in production, my goal is to get to the root cause to
make sure it doesn't happen again.
Will remove the flush(). No idea why it's there.
Attaching checkIndex.Main() output (why did I bother writing my own output
:#)

*Output:*
Opening index @ C:\\customers\\SC\\corrupt catalog index E3 -
Nov\\WDPP29715_ap03\\opt\\WAS\\LotusConnections\\Data\\catalog\\index\\Places\\index

Segments file=segments_2bsy numSegments=10 version=FORMAT_3_1 [Lucene 3.1]
userData={VERSION=9460}
  1 of 10: name=_9n docCount=4141
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3.89
    diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
1085809 - 2011-03-26 17:59:57, optimize=false,
os.version=2.6.18-348.12.1.el5}
    has deletions [delFileName=_9n_2t.del]
    test: open reader.........OK [209 deleted docs]
    test: fields..............OK [27 fields]
    test: field norms.........OK [4 fields]
    test: terms, freq, prox...OK [82966 terms; 295120 terms/docs pairs;
300750 tokens]
    test: stored fields.......OK [72872 total field count; avg 18.533
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  2 of 10: name=_9o docCount=19999
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=21.487
    diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
1085809 - 2011-03-26 17:59:57, optimize=false,
os.version=2.6.18-348.12.1.el5}
    has deletions [delFileName=_9o_lg.del]
    test: open reader.........OK [1396 deleted docs]
    test: fields..............OK [27 fields]
    test: field norms.........OK [4 fields]
    test: terms, freq, prox...OK [318090 terms; 1773898 terms/docs pairs;
1888318 tokens]
    test: stored fields.......OK [390466 total field count; avg 20.989
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  3 of 10: name=_6k docCount=2000
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=2.386
    diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
1085809 - 2011-03-26 17:59:57, os.version=2.6.18-348.12.1.el5}
    has deletions [delFileName=_6k_62.del]
    test: open reader.........OK [389 deleted docs]
    test: fields..............OK [27 fields]
    test: field norms.........OK [4 fields]
    test: terms, freq, prox...OK [46699 terms; 193013 terms/docs pairs;
178450 tokens]
    test: stored fields.......OK [35965 total field count; avg 22.325
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  4 of 10: name=_6l docCount=2000
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=2.477
    diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
1085809 - 2011-03-26 17:59:57, os.version=2.6.18-348.12.1.el5}
    has deletions [delFileName=_6l_hx.del]
    test: open reader.........OK [864 deleted docs]
    test: fields..............OK [27 fields]
    test: field norms.........OK [4 fields]
    test: terms, freq, prox...OK [55730 terms; 196164 terms/docs pairs;
117213 tokens]
    test: stored fields.......OK [23202 total field count; avg 20.424
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  5 of 10: name=_ug3 docCount=2949
    compound=false
    hasProx=true
    numFiles=9
FAILED
    WARNING: fixIndex() would remove reference to this segment; full
exception:
java.io.FileNotFoundException: _ug3_10h.del
    at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
    at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)

  6 of 10: name=_1qro docCount=2701
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3.478
    diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
1085809 - 2011-03-26 17:59:57, optimize=false,
os.version=2.6.18-348.16.1.el5}
    has deletions [delFileName=_1qro_p0.del]
    test: open reader.........OK [1473 deleted docs]
    test: fields..............OK [30 fields]
    test: field norms.........OK [4 fields]
    test: terms, freq, prox...OK [76625 terms; 278909 terms/docs pairs;
143954 tokens]
    test: stored fields.......OK [25932 total field count; avg 21.117
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  7 of 10: name=_24xf docCount=1645
    compound=true
    hasProx=true
    numFiles=2
FAILED
    WARNING: fixIndex() would remove reference to this segment; full
exception:
java.io.FileNotFoundException: _24xf_9l.del
    at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
    at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)

  8 of 10: name=_2czq docCount=681
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=0.947
    diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
mergeFactor=10, source=merge, java.version=1.6.0, lucene.version=3.1.0
1085809 - 2011-03-26 17:59:57, optimize=false,
os.version=2.6.18-348.16.1.el5}
    has deletions [delFileName=_2czq_3.del]
    test: open reader.........OK [465 deleted docs]
    test: fields..............OK [30 fields]
    test: field norms.........OK [4 fields]
    test: terms, freq, prox...OK [25569 terms; 72519 terms/docs pairs;
21076 tokens]
    test: stored fields.......OK [4328 total field count; avg 20.037 fields
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  9 of 10: name=_2d00 docCount=1997
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=2.564
    diagnostics = {os.arch=amd64, java.vendor=IBM Corporation, os=Linux,
source=addIndexes(IndexReader...), java.version=1.6.0, lucene.version=3.1.0
1085809 - 2011-03-26 17:59:57, os.version=2.6.18-371.el5}
    has deletions [delFileName=_2d00_1.del]
    test: open reader.........OK [1 deleted docs]
    test: fields..............OK [31 fields]
    test: field norms.........OK [4 fields]
    test: terms, freq, prox...OK [57448 terms; 204959 terms/docs pairs;
241754 tokens]
    test: stored fields.......OK [44545 total field count; avg 22.317
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  10 of 10: name=_2l3x docCount=865
    compound=true
    hasProx=true
    numFiles=1
FAILED
    WARNING: fixIndex() would remove reference to this segment; full
exception:
java.io.FileNotFoundException: _2l3x.cfs
    at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:292)
    at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:299)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:446)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:898)

WARNING: 3 broken segments (containing 5459 documents) detected
WARNING: would write new segments file, and 5459 documents would be lost,
if -fix were specified


On Wed, Nov 6, 2013 at 11:07 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
> > Hello,
> > I got an index corruption in production, and was wondering if it might
> be a
> > known bug (still with Lucene 3.1), or is my code doing something wrong.
> > It's a local disk index. No known machine power lose. No suppose to even
> > happen, right?
> >
> > This index that got corrupted is updated every 30sec; adding to it a
> small
> > delta's index (using addIndexes()) that was replicated from another
> machine.
> > The series of writer actions to update the index is:
> > 1. writer.deleteDocuments(q);
> > 2. writer.flush(false, true);
> > 3. writer.addIndexes(reader);
> > 4. writer.commit(map);
> >
> > Is the index exposed to corruptions only during commit, or is
> addIndexes()
> > risky by itself (doc says it's not).
> > LUCENE-2610 <https://issues.apache.org/jira/browse/LUCENE-2610> kind of
> > looks in the neberhood, though it's not a bug report.
>
> Hi, LUCENE-2610 is completely unrelated, as this only affects
> addIndexes(Directory...), not addIndexes(IndexReader...). The one you are
> using is using the natural Lucene merging as it is done all the time while
> indexing documents (Lucene internally uses the same code like
> addIndexes(IndexReader) to merge segments). addIndexes(Directory) is very
> different and more risky to have bugs in older Lucene versions (this one
> copies index files around without touching them, but renaming them to have
> new segment names - which is somehow "unnatural"; it also does not
> correctly lock the index directory in older versions).
>
> Why do you call flush() at all? I would leave that out, there is no reason
> to do this from userland code.
>
> To "repair" the index, use Checkindex command line tool with the "fix"
> option. This will delete the segment that is missing (_33gg.cfs). Of course
> this data is lost, but as the file is not there it is lost already. This
> will just remove the metadata of this missing segment from your index. But
> before doing this, you should check what checkindex prints out without fix
> option - the info you posted is not the console output the tool prints when
> ran from command line and run with assertions enabled (-ea JVM option). The
> output looks like the "toString) of the Java API of CheckIndex class, which
> is not so helpful. Please post the full output of the tool executed from
> command line:
>         java -cp lucene-core-3.1.0.jar org.apache.lucene.index.CheckIndex
> <options....>
>
> Uwe
>
> > I'll add an ls -l output in a follow up email.
> >
> > Technically the first indication of problems is when calling flush, but
> it could
> > be that the previous writer action left it broken for flush to fail.
> > My stack trace is:
> > Caused by: java.io.FileNotFoundException:
> > /disks/data1/opt/WAS/LotusConnections/Data/catalog/index/Places/index/
> > _33gg.cfs
> > (No such file or directory)
> >     at java.io.RandomAccessFile.open(Native Method)
> >     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
> >     at
> > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.
> > <init>(SimpleFSDirectory.java:69)
> >     at
> > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(Simp
> > leFSDirectory.java:90)
> >     at
> > org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDire
> > ctory.java:91)
> >     at
> > org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
> >     at
> > org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.
> > java:66)
> >     at
> > org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentRead
> > er.java:113)
> >     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
> >     at
> > org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:684)
> >     at
> > org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:659)
> >     at
> > org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.jav
> > a:283)
> >     at
> > org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.jav
> > a:191)
> >     at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3358)
> >     at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,
> Hello,
> I got an index corruption in production, and was wondering if it might be a
> known bug (still with Lucene 3.1), or is my code doing something wrong.
> It's a local disk index. No known machine power lose. No suppose to even
> happen, right?
> 
> This index that got corrupted is updated every 30sec; adding to it a small
> delta's index (using addIndexes()) that was replicated from another machine.
> The series of writer actions to update the index is:
> 1. writer.deleteDocuments(q);
> 2. writer.flush(false, true);
> 3. writer.addIndexes(reader);
> 4. writer.commit(map);
> 
> Is the index exposed to corruptions only during commit, or is addIndexes()
> risky by itself (doc says it's not).
> LUCENE-2610 <https://issues.apache.org/jira/browse/LUCENE-2610> kind of
> looks in the neberhood, though it's not a bug report.

Hi, LUCENE-2610 is completely unrelated, as this only affects addIndexes(Directory...), not addIndexes(IndexReader...). The one you are using is using the natural Lucene merging as it is done all the time while indexing documents (Lucene internally uses the same code like addIndexes(IndexReader) to merge segments). addIndexes(Directory) is very different and more risky to have bugs in older Lucene versions (this one copies index files around without touching them, but renaming them to have new segment names - which is somehow "unnatural"; it also does not correctly lock the index directory in older versions).

Why do you call flush() at all? I would leave that out, there is no reason to do this from userland code.

To "repair" the index, use Checkindex command line tool with the "fix" option. This will delete the segment that is missing (_33gg.cfs). Of course this data is lost, but as the file is not there it is lost already. This will just remove the metadata of this missing segment from your index. But before doing this, you should check what checkindex prints out without fix option - the info you posted is not the console output the tool prints when ran from command line and run with assertions enabled (-ea JVM option). The output looks like the "toString) of the Java API of CheckIndex class, which is not so helpful. Please post the full output of the tool executed from command line:
	java -cp lucene-core-3.1.0.jar org.apache.lucene.index.CheckIndex <options....>

Uwe

> I'll add an ls -l output in a follow up email.
> 
> Technically the first indication of problems is when calling flush, but it could
> be that the previous writer action left it broken for flush to fail.
> My stack trace is:
> Caused by: java.io.FileNotFoundException:
> /disks/data1/opt/WAS/LotusConnections/Data/catalog/index/Places/index/
> _33gg.cfs
> (No such file or directory)
>     at java.io.RandomAccessFile.open(Native Method)
>     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
>     at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.
> <init>(SimpleFSDirectory.java:69)
>     at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(Simp
> leFSDirectory.java:90)
>     at
> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDire
> ctory.java:91)
>     at
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
>     at
> org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.
> java:66)
>     at
> org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentRead
> er.java:113)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
>     at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:684)
>     at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:659)
>     at
> org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.jav
> a:283)
>     at
> org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.jav
> a:191)
>     at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3358)
>     at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Corrupt Index with IndexWriter.addIndexes(IndexReader readers[])

Posted by Michael McCandless <lu...@mikemccandless.com>.

No, it's not supposed to happen :)

That exception is happening because IW is trying to apply the deletes
during flush, and needs to open a reader for all existing segments to
do so, and then finds that 33gg.cfs does not exist.

Which is odd, because from your ls -l output, you have no segments
even close to _33gg; the newest segment I see in your listing is
_21wz.  So it's very weird that IW thought such a segment would exist.

Do you ever call addIndexes(Directory[])?

Is this a remote filesystem?  Mounted via NFS?

Can you post the command-line output from CheckIndex, not the
programmatic output?  Maybe it includes more details.

The index is not exposed to corruptions during commit nor addIndexes;
if bad things (computer loses power, JVM or OS crashes) happen during
these methods then on reboot/restart the index will show whatever its
state was as of the last successful commit.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Nov 5, 2013 at 5:38 PM, Gili Nachum <gi...@gmail.com> wrote:
> Hello,
> I got an index corruption in production, and was wondering if it might be a
> known bug (still with Lucene 3.1), or is my code doing something wrong.
> It's a local disk index. No known machine power lose. No suppose to even
> happen, right?
>
> This index that got corrupted is updated every 30sec; adding to it a small
> delta's index (using addIndexes()) that was replicated from another machine.
> The series of writer actions to update the index is:
> 1. writer.deleteDocuments(q);
> 2. writer.flush(false, true);
> 3. writer.addIndexes(reader);
> 4. writer.commit(map);
>
> Is the index exposed to corruptions only during commit, or is addIndexes()
> risky by itself (doc says it's not).
> LUCENE-2610 <https://issues.apache.org/jira/browse/LUCENE-2610> kind of
> looks in the neberhood, though it's not a bug report.
> I'll add an ls -l output in a follow up email.
>
> Technically the first indication of problems is when calling flush, but it
> could be that the previous writer action left it broken for flush to fail.
> My stack trace is:
> Caused by: java.io.FileNotFoundException:
> /disks/data1/opt/WAS/LotusConnections/Data/catalog/index/Places/index/_33gg.cfs
> (No such file or directory)
>     at java.io.RandomAccessFile.open(Native Method)
>     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
>     at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:69)
>     at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:90)
>     at
> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:91)
>     at
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
>     at
> org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:66)
>     at
> org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:113)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
>     at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:684)
>     at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:659)
>     at
> org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:283)
>     at
> org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:191)
>     at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3358)
>     at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3296)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org