You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Nam-Quang Tran (Jira)" <ji...@apache.org> on 2021/06/06 16:15:00 UTC

[jira] [Comment Edited] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

    [ https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358154#comment-17358154 ] 

Nam-Quang Tran edited comment on LUCENE-8118 at 6/6/21, 4:14 PM:
-----------------------------------------------------------------

Here's another stacktrace, but slightly different from the one in the original post. My crash happens not with *addDocuments*, but with *addDocument*. Also, the suggested workaround of committing every 50k documents does not work for me, it still crashes the same way. Committing every 5k documents does not work either. Maximum heap size is 16 GB. Lucene version is 8.5.2.
{quote}org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
 at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:681)
 at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:695)
 at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1591)
 at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
 at com.docfetcherpro.model.TreeFolderWrapper$.addDoc(TreeModelWrapper.scala:189)
 at com.docfetcherpro.model.TreeNodeWrapper.addDoc(TreeModelWrapper.scala:533)
 at com.docfetcherpro.model.TreeNodeWrapper.update(TreeModelWrapper.scala:315)
 at com.docfetcherpro.model.TreeUpdate$.updateNodePair(TreeUpdate.scala:333)
 at com.docfetcherpro.model.TreeUpdate$.$anonfun$update$6(TreeUpdate.scala:137)
 at com.docfetcherpro.model.TreeUpdate$.$anonfun$update$6$adapted(TreeUpdate.scala:133)
 at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
 at scala.collection.immutable.List.foreach(List.scala:431)
 at scala.collection.generic.TraversableForwarder.foreach(TraversableForwarder.scala:38)
 at scala.collection.generic.TraversableForwarder.foreach$(TraversableForwarder.scala:38)
 at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:47)
 at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
 at com.docfetcherpro.model.TreeUpdate$.update(TreeUpdate.scala:133)
 at com.docfetcherpro.model.IndexActor.index1(IndexActor.scala:127)
 at com.docfetcherpro.model.IndexActor.$anonfun$index$1(IndexActor.scala:18)
 at com.docfetcherpro.util.MethodActor$$anon$3.run(MethodActor.scala:86)
 at com.docfetcherpro.util.MethodActor.com$docfetcherpro$util$MethodActor$$threadLoop(MethodActor.scala:185)
 at com.docfetcherpro.util.MethodActor$$anon$2.run(MethodActor.scala:67)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -65536 out of bounds for length 71428
 at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
 at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:221)
 at org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:80)
 at org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:121)
 at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:178)
 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:862)
 at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
 at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
 at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
 at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
 at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
 at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
 at com.docfetcherpro.model.TreeFolderWrapper$.addDoc(TreeModelWrapper.scala:189)
 at com.docfetcherpro.model.TreeNodeWrapper.addDoc(TreeModelWrapper.scala:533)
 at com.docfetcherpro.model.TreeNodeWrapper.update(TreeModelWrapper.scala:310)
 ... 15 more
{quote}


was (Author: qforce):
Here's another stacktrace, but slightly different from the one in the original post. My crash happens not with *addDocuments*, but with *addDocument*. Also, the suggested workaround of committing every 50k documents does not work for me, it still crashes the same way. Committing every 5k documents does not work either. Maximum heap size is 16 GB. Lucene version is 8.5.2.

{{org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:681)
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:695)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1591)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
	at com.docfetcherpro.model.TreeFolderWrapper$.addDoc(TreeModelWrapper.scala:189)
	at com.docfetcherpro.model.TreeNodeWrapper.addDoc(TreeModelWrapper.scala:533)
	at com.docfetcherpro.model.TreeNodeWrapper.update(TreeModelWrapper.scala:315)
	at com.docfetcherpro.model.TreeUpdate$.updateNodePair(TreeUpdate.scala:333)
	at com.docfetcherpro.model.TreeUpdate$.$anonfun$update$6(TreeUpdate.scala:137)
	at com.docfetcherpro.model.TreeUpdate$.$anonfun$update$6$adapted(TreeUpdate.scala:133)
	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at scala.collection.generic.TraversableForwarder.foreach(TraversableForwarder.scala:38)
	at scala.collection.generic.TraversableForwarder.foreach$(TraversableForwarder.scala:38)
	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:47)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
	at com.docfetcherpro.model.TreeUpdate$.update(TreeUpdate.scala:133)
	at com.docfetcherpro.model.IndexActor.index1(IndexActor.scala:127)
	at com.docfetcherpro.model.IndexActor.$anonfun$index$1(IndexActor.scala:18)
	at com.docfetcherpro.util.MethodActor$$anon$3.run(MethodActor.scala:86)
	at com.docfetcherpro.util.MethodActor.com$docfetcherpro$util$MethodActor$$threadLoop(MethodActor.scala:185)
	at com.docfetcherpro.util.MethodActor$$anon$2.run(MethodActor.scala:67)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -65536 out of bounds for length 71428
	at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
	at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:221)
	at org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:80)
	at org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:121)
	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:178)
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:862)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
	at com.docfetcherpro.model.TreeFolderWrapper$.addDoc(TreeModelWrapper.scala:189)
	at com.docfetcherpro.model.TreeNodeWrapper.addDoc(TreeModelWrapper.scala:533)
	at com.docfetcherpro.model.TreeNodeWrapper.update(TreeModelWrapper.scala:310)
	... 15 more}}

> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-8118
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8118
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 7.2
>         Environment: Debian/Stretch
> java version "1.8.0_144"                                                                                                                                                                                       Java(TM) SE Runtime Environment (build 1.8.0_144-b01)                                                                                                                                                          Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>            Reporter: Laura Dietz
>            Priority: Major
>         Attachments: LUCENE-8118_test.patch
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Indexing a large collection of about 20 million paragraph-sized documents results in an ArrayIndexOutOfBoundsException in org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace below). 
> The bug is possibly related to issues described in [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]  and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example] 
> - compile with `mvn compile assembly:single`
> - run with `java -cp ./target/treccar-tools-example-0.1-jar-with-dependencies.jar edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536                                                                           at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)                                                                                                                             at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)                                                                                                                             at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)                                                                                                           at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)                                                                                                                                   at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)                                                                                                                 at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)                                                                                                                    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)                                                                                                                 at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)                                                                                                         at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)                                                                                                                           at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)                                                                                                                                  at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
>         at edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org