You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Adam Constabaris <ad...@unc.edu> on 2006/04/21 15:48:37 UTC

similar ArrayIndexOutOfBoundsException on searching and optimizing

This is a puzzler, I'm not sure if I'm doing something wrong or whether 
I have a poisoned document, a corrupted index (failing to close my 
IndexModifier properly?) or what.  The setup is this: I have two 
processes (the backend and frontend of a CMS) that run in two different 
VMs -- both use Lucene 1.9.1 with the PorterStemmerAnalyzer wrapper over 
the StandardAnalyzer (from lucene-memory AnalyzerUtils).

The backend is responsible for index creation, updates, etc., while the 
frontend process uses the created index.  What's puzzling is that some 
queries will die with an ArrayIndexOutOfBoundsException being thrown out 
of the BitVector class:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 240
         at org.apache.lucene.util.BitVector.get(BitVector.java:63)
         at 
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:133)
         at org.apache.lucene.search.TermScorer.next(TermScorer.java:105)
         at 
org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:151)
         at 
org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:125)
         at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:290)
         at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) 
      at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99)
         at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
         at org.apache.lucene.search.Hits.<init>(Hits.java:44)
         at org.apache.lucene.search.Searcher.search(Searcher.java:44)
         at org.apache.lucene.search.Searcher.search(Searcher.java:36)

The only pattern I've been able to discern in queries that cause this 
problem is that (a) they search the "contents" field (tokenized, 
unstored, TermVector.YES), and (b) it *seems* that it mostly happens 
with longer terms in the query.  Although the frontend defaults to a 
multifield query, the same happens when I use "contents:<<term>>" and 
does not happen if I specify <<term>> and any other of the default 
fields used by the MultiFieldQueryParser.

Here's where it gets interesting: I've noticed that calling optimize() 
on the index as it's created by the server process is also throwing a 
hissy fit, with an *eerily similar* index:

java.lang.ArrayIndexOutOfBoundsException: 239
         at org.apache.lucene.util.BitVector.get(BitVector.java:63)
         at 
org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:288)
         at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
         at 
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
         at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
         at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
         at 
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
         at 
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553)

Does anybody have any ideas about what I might be doing wrong, or if 
I've possibly uncovered a bug?  I'm too new to the scene to know where I 
ought to start with this.

Re: [Spam:5.0] Re: similar ArrayIndexOutOfBoundsException on searching and optimizing

Posted by Patrick Kimber <ma...@gmail.com>.

Hi Adam

Thanks for your help.

Patrick

On 23/05/06, Adam Constabaris <ad...@unc.edu> wrote:
> Patrick Kimber wrote:
> > Hi Adam
> >
> > We are getting the same error.  Did you manage to work out what was
> > causing the problem?
> >
> > Thanks
> > Patrick
>
> I can't say anything definitive about this, but I think it was due to a
> corrupted index; on the hunch that the index creation/update threads
> were reliably putting bad data into the index, I got more careful about
> the way the IndexModifier was being used: the updating process runs a
> series of tasks periodically, and by calling modifier.flush() at the end
> of each processing run, the AIOB exceptions went away.  I never resolved
> why the indices in the error messages were so similar, however.  I hope
> that helps in some small way (or a large way, but I'm a realist =).
>
> AC
>
> >
> > On 21/04/06, Adam Constabaris <ad...@unc.edu> wrote:
> >> This is a puzzler, I'm not sure if I'm doing something wrong or whether
> >> I have a poisoned document, a corrupted index (failing to close my
> >> IndexModifier properly?) or what.  The setup is this: I have two
> >> processes (the backend and frontend of a CMS) that run in two different
> >> VMs -- both use Lucene 1.9.1 with the PorterStemmerAnalyzer wrapper over
> >> the StandardAnalyzer (from lucene-memory AnalyzerUtils).
> >>
> >> The backend is responsible for index creation, updates, etc., while the
> >> frontend process uses the created index.  What's puzzling is that some
> >> queries will die with an ArrayIndexOutOfBoundsException being thrown out
> >> of the BitVector class:
> >>
> >> Caused by: java.lang.ArrayIndexOutOfBoundsException: 240
> >>          at org.apache.lucene.util.BitVector.get(BitVector.java:63)
> >>          at
> >> org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:133)
> >>          at org.apache.lucene.search.TermScorer.next(TermScorer.java:105)
> >>          at
> >> org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:151)
> >>
> >>          at
> >> org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:125)
> >>
> >>          at
> >> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:290)
> >>          at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
> >>       at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99)
> >>          at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
> >>          at org.apache.lucene.search.Hits.<init>(Hits.java:44)
> >>          at org.apache.lucene.search.Searcher.search(Searcher.java:44)
> >>          at org.apache.lucene.search.Searcher.search(Searcher.java:36)
> >>
> >> The only pattern I've been able to discern in queries that cause this
> >> problem is that (a) they search the "contents" field (tokenized,
> >> unstored, TermVector.YES), and (b) it *seems* that it mostly happens
> >> with longer terms in the query.  Although the frontend defaults to a
> >> multifield query, the same happens when I use "contents:<<term>>" and
> >> does not happen if I specify <<term>> and any other of the default
> >> fields used by the MultiFieldQueryParser.
> >>
> >> Here's where it gets interesting: I've noticed that calling optimize()
> >> on the index as it's created by the server process is also throwing a
> >> hissy fit, with an *eerily similar* index:
> >>
> >> java.lang.ArrayIndexOutOfBoundsException: 239
> >>          at org.apache.lucene.util.BitVector.get(BitVector.java:63)
> >>          at
> >> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:288)
> >>          at
> >> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
> >>          at
> >> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
> >>          at
> >> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
> >>          at
> >> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
> >>          at
> >> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
> >>          at
> >> org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553)
> >>
> >> Does anybody have any ideas about what I might be doing wrong, or if
> >> I've possibly uncovered a bug?  I'm too new to the scene to know where I
> >> ought to start with this.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: [Spam:5.0] Re: similar ArrayIndexOutOfBoundsException on searching and optimizing

Posted by Adam Constabaris <ad...@unc.edu>.

Patrick Kimber wrote:
> Hi Adam
> 
> We are getting the same error.  Did you manage to work out what was
> causing the problem?
> 
> Thanks
> Patrick

I can't say anything definitive about this, but I think it was due to a 
corrupted index; on the hunch that the index creation/update threads 
were reliably putting bad data into the index, I got more careful about 
the way the IndexModifier was being used: the updating process runs a 
series of tasks periodically, and by calling modifier.flush() at the end 
of each processing run, the AIOB exceptions went away.  I never resolved 
why the indices in the error messages were so similar, however.  I hope 
that helps in some small way (or a large way, but I'm a realist =).

AC

> 
> On 21/04/06, Adam Constabaris <ad...@unc.edu> wrote:
>> This is a puzzler, I'm not sure if I'm doing something wrong or whether
>> I have a poisoned document, a corrupted index (failing to close my
>> IndexModifier properly?) or what.  The setup is this: I have two
>> processes (the backend and frontend of a CMS) that run in two different
>> VMs -- both use Lucene 1.9.1 with the PorterStemmerAnalyzer wrapper over
>> the StandardAnalyzer (from lucene-memory AnalyzerUtils).
>>
>> The backend is responsible for index creation, updates, etc., while the
>> frontend process uses the created index.  What's puzzling is that some
>> queries will die with an ArrayIndexOutOfBoundsException being thrown out
>> of the BitVector class:
>>
>> Caused by: java.lang.ArrayIndexOutOfBoundsException: 240
>>          at org.apache.lucene.util.BitVector.get(BitVector.java:63)
>>          at
>> org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:133)
>>          at org.apache.lucene.search.TermScorer.next(TermScorer.java:105)
>>          at
>> org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:151) 
>>
>>          at
>> org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:125) 
>>
>>          at
>> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:290)
>>          at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>>       at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99)
>>          at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
>>          at org.apache.lucene.search.Hits.<init>(Hits.java:44)
>>          at org.apache.lucene.search.Searcher.search(Searcher.java:44)
>>          at org.apache.lucene.search.Searcher.search(Searcher.java:36)
>>
>> The only pattern I've been able to discern in queries that cause this
>> problem is that (a) they search the "contents" field (tokenized,
>> unstored, TermVector.YES), and (b) it *seems* that it mostly happens
>> with longer terms in the query.  Although the frontend defaults to a
>> multifield query, the same happens when I use "contents:<<term>>" and
>> does not happen if I specify <<term>> and any other of the default
>> fields used by the MultiFieldQueryParser.
>>
>> Here's where it gets interesting: I've noticed that calling optimize()
>> on the index as it's created by the server process is also throwing a
>> hissy fit, with an *eerily similar* index:
>>
>> java.lang.ArrayIndexOutOfBoundsException: 239
>>          at org.apache.lucene.util.BitVector.get(BitVector.java:63)
>>          at
>> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:288)
>>          at
>> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
>>          at
>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
>>          at
>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
>>          at
>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
>>          at
>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
>>          at
>> org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553)
>>
>> Does anybody have any ideas about what I might be doing wrong, or if
>> I've possibly uncovered a bug?  I'm too new to the scene to know where I
>> ought to start with this.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: similar ArrayIndexOutOfBoundsException on searching and optimizing

Posted by Patrick Kimber <ma...@gmail.com>.

Hi Adam

We are getting the same error.  Did you manage to work out what was
causing the problem?

Thanks
Patrick

On 21/04/06, Adam Constabaris <ad...@unc.edu> wrote:
> This is a puzzler, I'm not sure if I'm doing something wrong or whether
> I have a poisoned document, a corrupted index (failing to close my
> IndexModifier properly?) or what.  The setup is this: I have two
> processes (the backend and frontend of a CMS) that run in two different
> VMs -- both use Lucene 1.9.1 with the PorterStemmerAnalyzer wrapper over
> the StandardAnalyzer (from lucene-memory AnalyzerUtils).
>
> The backend is responsible for index creation, updates, etc., while the
> frontend process uses the created index.  What's puzzling is that some
> queries will die with an ArrayIndexOutOfBoundsException being thrown out
> of the BitVector class:
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 240
>          at org.apache.lucene.util.BitVector.get(BitVector.java:63)
>          at
> org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:133)
>          at org.apache.lucene.search.TermScorer.next(TermScorer.java:105)
>          at
> org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:151)
>          at
> org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:125)
>          at
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:290)
>          at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>       at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99)
>          at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
>          at org.apache.lucene.search.Hits.<init>(Hits.java:44)
>          at org.apache.lucene.search.Searcher.search(Searcher.java:44)
>          at org.apache.lucene.search.Searcher.search(Searcher.java:36)
>
> The only pattern I've been able to discern in queries that cause this
> problem is that (a) they search the "contents" field (tokenized,
> unstored, TermVector.YES), and (b) it *seems* that it mostly happens
> with longer terms in the query.  Although the frontend defaults to a
> multifield query, the same happens when I use "contents:<<term>>" and
> does not happen if I specify <<term>> and any other of the default
> fields used by the MultiFieldQueryParser.
>
> Here's where it gets interesting: I've noticed that calling optimize()
> on the index as it's created by the server process is also throwing a
> hissy fit, with an *eerily similar* index:
>
> java.lang.ArrayIndexOutOfBoundsException: 239
>          at org.apache.lucene.util.BitVector.get(BitVector.java:63)
>          at
> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:288)
>          at
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
>          at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
>          at
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
>          at
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
>          at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
>          at
> org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553)
>
> Does anybody have any ideas about what I might be doing wrong, or if
> I've possibly uncovered a bug?  I'm too new to the scene to know where I
> ought to start with this.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org