You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Aaron McKee <uc...@gmail.com> on 2009/10/19 21:04:55 UTC

ArrayIndexOutOfBoundsException during indexing

I was wondering if anyone might have any insight on the following 
problem. I'm using the latest Solr code from SVN and indexing around 17m 
XML records via DIH. With perfect replicability, the following exception 
is thrown on the same aggregate file (#236, and each XML file has ~50k 
records), although not necessarily the same exact record. Oddly, it 
doesn't appear to be due to anything in the file - if I change the 
ordering or just index the file alone, it works fine.

java.lang.ArrayIndexOutOfBoundsException: -65536
        at 
org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479)
        at 
org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502)
        at 
org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130)
        at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467)
        at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
        at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
        at 
org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75)
        at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
        at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

The related Lucene code is a bit thick and I'm having a hard time 
figuring out what could be going on here. I've added a bit of debug 
output to some of the intermediary classes and it looks like the 
exception is generally being thrown while processing one of my dynamic 
fields (type=tdouble, indexed=t, stored=f). The GeoUpdateProcessor code 
referenced above is my own, but essentially is the same as the LocalSolr 
update processor; it just contains a few lines of code that calculates a 
double value from two document fields and then stores that value in one 
of these dynamic fields. It hasn't caused any previous problems, only 
interacts with the underlying framework via cmd.geSolrInputDocument(), 
doc.getFieldValue(string), doc.addField(string, double), and 
next.processAdd(cmd),  and I've generated a number of indexes with it in 
the past, so I don't -think- that's a likely culprit. I've tried a run 
without the update processor and the problem seemed to go away (it made 
it past the above file, at least), but then this changes so many other 
factors that I don't know how much that really tells me (reduces field 
count by ~13 fields, eliminates all dynamic fields, etc.).

The only other thing worth mentioning is that I've replaced the Solr 
trunk Lucene jars with my own compiled versions, based off 2.9.0. The 
only thing different versus the 'stable' release is that it includes a 
few additional libraries (no core or contrib classes were modified). I 
haven't heard of any check-ins between 2.9.0 and 2.9.1-dev that should 
affect this...

Has anyone else run into a problem like this before?

Thanks,
Aaron


Re: ArrayIndexOutOfBoundsException during indexing

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Thanks for the report Aaron, this definitely looks like a Lucene bug,
and I've opened
https://issues.apache.org/jira/browse/LUCENE-1995
Can you follow up there (I asked about your index settings).

-Yonik
http://www.lucidimagination.com



On Mon, Oct 19, 2009 at 3:04 PM, Aaron McKee <uc...@gmail.com> wrote:
> I was wondering if anyone might have any insight on the following problem.
> I'm using the latest Solr code from SVN and indexing around 17m XML records
> via DIH. With perfect replicability, the following exception is thrown on
> the same aggregate file (#236, and each XML file has ~50k records), although
> not necessarily the same exact record. Oddly, it doesn't appear to be due to
> anything in the file - if I change the ordering or just index the file
> alone, it works fine.
>
> java.lang.ArrayIndexOutOfBoundsException: -65536
>       at
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479)
>       at
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502)
>       at
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130)
>       at
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467)
>       at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
>       at
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
>       at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
>       at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
>       at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
>       at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
>       at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
>       at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>       at
> org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75)
>       at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
>       at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
>       at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
>       at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
>       at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
>
> The related Lucene code is a bit thick and I'm having a hard time figuring
> out what could be going on here. I've added a bit of debug output to some of
> the intermediary classes and it looks like the exception is generally being
> thrown while processing one of my dynamic fields (type=tdouble, indexed=t,
> stored=f). The GeoUpdateProcessor code referenced above is my own, but
> essentially is the same as the LocalSolr update processor; it just contains
> a few lines of code that calculates a double value from two document fields
> and then stores that value in one of these dynamic fields. It hasn't caused
> any previous problems, only interacts with the underlying framework via
> cmd.geSolrInputDocument(), doc.getFieldValue(string), doc.addField(string,
> double), and next.processAdd(cmd),  and I've generated a number of indexes
> with it in the past, so I don't -think- that's a likely culprit. I've tried
> a run without the update processor and the problem seemed to go away (it
> made it past the above file, at least), but then this changes so many other
> factors that I don't know how much that really tells me (reduces field count
> by ~13 fields, eliminates all dynamic fields, etc.).
>
> The only other thing worth mentioning is that I've replaced the Solr trunk
> Lucene jars with my own compiled versions, based off 2.9.0. The only thing
> different versus the 'stable' release is that it includes a few additional
> libraries (no core or contrib classes were modified). I haven't heard of any
> check-ins between 2.9.0 and 2.9.1-dev that should affect this...
>
> Has anyone else run into a problem like this before?
>
> Thanks,
> Aaron
>
>