You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Luís Filipe Nassif (Jira)" <ji...@apache.org> on 2022/08/18 18:18:00 UTC
[jira] [Comment Edited] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
[ https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581478#comment-17581478 ]
Luís Filipe Nassif edited comment on LUCENE-8118 at 8/18/22 6:17 PM:
---------------------------------------------------------------------
Hi, a colleague of mine pointed this to me. Should I close https://issues.apache.org/jira/browse/LUCENE-10681 as duplicate?
We hit this AIOOBE in the 640th iteration of addDocumentS(Iterable) with ~10MB sized docs. Is there a reasonable numDocs x docSize limit for addDocumentS()?
PS: possibly there were other documents being indexed in parallel by other threads
was (Author: lfcnassif):
Hi, a colleague of mine pointed this to me. Should I close https://issues.apache.org/jira/browse/LUCENE-10681 as duplicate?
We hit this AIOOBE in the 640th iteration of addDocumentS(Iterable) with ~10MB sized docs. Is there a reasonable numDocs x docDize limit for addDocumentS()?
PS: possibly there were other documents being indexed in parallel by other threads
> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -----------------------------------------------------------------------------
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
> Reporter: Laura Dietz
> Priority: Major
> Attachments: LUCENE-8118_test.patch
>
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> Indexing a large collection of about 20 million paragraph-sized documents results in an ArrayIndexOutOfBoundsException in org.apache.lucene.index.TermsHashPerField.writeByte (full stack trace below).
> The bug is possibly related to issues described in [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html] and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from [GitHub trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
> - compile with `mvn compile assembly:single`
> - run with `java -cp ./target/treccar-tools-example-0.1-jar-with-dependencies.jar edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536 at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198) at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224) at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281) at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451) at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532) at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org