You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Marc Morissette (JIRA)" <ji...@apache.org> on 2018/06/20 02:40:00 UTC

[jira] [Updated] (LUCENE-8365) ArrayIndexOutOfBoundsException in UnifiedHighlighter

     [ https://issues.apache.org/jira/browse/LUCENE-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marc Morissette updated LUCENE-8365:
------------------------------------
    Description: 
We see ArrayIndexOutOfBoundsExceptions coming out of the UnifiedHighlighter in our production logs from time to time:

{code}
java.lang.ArrayIndexOutOfBoundsException
	at java.base/java.lang.System.arraycopy(Native Method)
	at org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386)
	at org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341)
	at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121)
	at org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149)
	at org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171)
	at org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120)
	at org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261)
...
{code}

It turns out that there is an "off by one" error in the UnifiedHighlighter's code that, as far as I can tell, is only triggered when two nested SpanNearQueries contain the same term.

The resulting behaviour depends on the content of the highlighted document. Either, some highlighted terms go missing or an ArrayIndexOutOfBoundsException is thrown.

  was:
We see an ArrayOutOfBoundsExceptions coming out of the UnifiedHighlighter in our production logs from time to time:

{code}
java.lang.ArrayIndexOutOfBoundsException
	at java.base/java.lang.System.arraycopy(Native Method)
	at org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386)
	at org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341)
	at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121)
	at org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149)
	at org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171)
	at org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120)
	at org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261)
...
{code}

It turns out that there is an "off by one" error in UnifiedHighlighter code that, as far as I can tell, is currently only invoked when two nested SpanNearQueries contain the same term.

The behaviour depends on the highlighted document. In most cases, some terms will fail to be highlighted. In others, an Exception is thrown.


> ArrayIndexOutOfBoundsException in UnifiedHighlighter
> ----------------------------------------------------
>
>                 Key: LUCENE-8365
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8365
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>    Affects Versions: 7.3.1
>            Reporter: Marc Morissette
>            Priority: Major
>
> We see ArrayIndexOutOfBoundsExceptions coming out of the UnifiedHighlighter in our production logs from time to time:
> {code}
> java.lang.ArrayIndexOutOfBoundsException
> 	at java.base/java.lang.System.arraycopy(Native Method)
> 	at org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386)
> 	at org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341)
> 	at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121)
> 	at org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149)
> 	at org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171)
> 	at org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120)
> 	at org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261)
> ...
> {code}
> It turns out that there is an "off by one" error in the UnifiedHighlighter's code that, as far as I can tell, is only triggered when two nested SpanNearQueries contain the same term.
> The resulting behaviour depends on the content of the highlighted document. Either, some highlighted terms go missing or an ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org