You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Laura Dietz (JIRA)" <ji...@apache.org> on 2018/01/05 06:16:01 UTC

[jira] [Created] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

Laura Dietz created LUCENE-8118:
-----------------------------------

             Summary: ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
                 Key: LUCENE-8118
                 URL: https://issues.apache.org/jira/browse/LUCENE-8118
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 7.2
         Environment: Debian/Stretch
java version "1.8.0_144"                                                                                                                                                                                       Java(TM) SE Runtime Environment (build 1.8.0_144-b01)                                                                                                                                                          Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
            Reporter: Laura Dietz


Indexing a large collection of about 20 million paragraph-sized documents results in an ArrayIndexOutOfBoundsException in org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace below). 


The bug is possibly related to issues described in [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]  and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I am not using SOLR, I am directly using Lucene Core.

The issue can be reproduced using code from  [GitHub trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example] 

- compile with `mvn compile assembly:single`
- run with `java -cp ./target/treccar-tools-example-0.1-jar-with-dependencies.jar edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`

Where paragraphCorpus.cbor is contained in this [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]



Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536                                                                           at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)                                                                                                                             at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)                                                                                                                             at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)                                                                                                           at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)                                                                                                                                   at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)                                                                                                                 at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)                                                                                                                    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)                                                                                                                 at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)                                                                                                         at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)                                                                                                                           at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)                                                                                                                                  at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
        at edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org