You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2010/09/24 12:24:32 UTC

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

    [ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914401#action_12914401 ] 

Michael McCandless commented on LUCENE-2666:
--------------------------------------------

This looks like index corruption -- somehow the deleted docs bit vector is too small for that segment.  We have to get to the root cause of how the corruption happened.

EG if you can enable IndexWriter's infoStream, then get the corruption to happen, and post the resulting log...

Also, try enabling assertions... it may catch the corruption sooner.

Can you describe how you use Lucene?  Do you do any direct file IO in the index dir?  (eg, for backup/restore or something).

Are you certain only one writer is open on the index?  (Do you disable Lucene's locking?)

Which OS, filesystem, java impl are you using?

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> -----------------------------------------------------------
>
>                 Key: LUCENE-2666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2666
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.2
>            Reporter: Shay Banon
>
> A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
> 	at org.apache.lucene.util.BitVector.get(BitVector.java:104)
> 	at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
> 	at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
> 	at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
> 	at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
> 	at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple:
>         FSDirectory dir = FSDirectory.open(new File("index"));
>         IndexReader reader = IndexReader.open(dir, true);
>         IndexReader[] subReaders = reader.getSequentialSubReaders();
>         for (IndexReader subReader : subReaders) {
>             Field field = subReader.getClass().getSuperclass().getDeclaredField("si");
>             field.setAccessible(true);
>             SegmentInfo si = (SegmentInfo) field.get(subReader);
>             System.out.println("--> " + si);
>             if (si.getDocStoreSegment().contains("_26t")) {
>                 // this is the probleatic one...
>                 System.out.println("problematic one...");
>                 FieldCache.DEFAULT.getLongs(subReader, "__documentdate", FieldCache.NUMERIC_UTILS_LONG_PARSER);
>             }
>         }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
>     compound=true
>     hasProx=true
>     numFiles=2
>     size (MB)=1.641
>     diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
>     has deletions [delFileName=_26t_1.del]
>     test: open reader.........OK [1 deleted docs]
>     test: fields..............OK [32 fields]
>     test: field norms.........OK [32 fields]
>     test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
> 	at org.apache.lucene.util.BitVector.get(BitVector.java:104)
> 	at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
> 	at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
> 	at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
> 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
> 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
> 	at TestMe.main(TestMe.java:47)
>     test: stored fields.......ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
> 	at org.apache.lucene.util.BitVector.get(BitVector.java:104)
> 	at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
> 	at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
> 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
> 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
> 	at TestMe.main(TestMe.java:47)
>     test: term vectors........ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
> 	at org.apache.lucene.util.BitVector.get(BitVector.java:104)
> 	at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
> 	at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
> 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515)
> 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
> 	at TestMe.main(TestMe.java:47)
> The creation of the index does not do something fancy (all defaults), though there is usage of the near real time aspect (IndexWriter#getReader) which does complicate deleted docs handling. Seems like the deleted docs got written without matching the number of docs?. Sadly, I don't have something that recreates it from scratch, but I do have the index if someone want to have a look at it (mail me directly and I will provide a download link).
> I will continue to investigate why this might happen, just wondering if someone stumbled on this exception before. Lucene 3.0.2 is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org