You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Rainer Jung (Jira)" <ji...@apache.org> on 2021/07/12 10:27:00 UTC

[jira] [Commented] (SOLR-13360) StringIndexOutOfBoundsException: String index out of range: -3

    [ https://issues.apache.org/jira/browse/SOLR-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379107#comment-17379107 ] 

Rainer Jung commented on SOLR-13360:
------------------------------------

The original old code tries to replace tokens in the original query by spell checked alternatives. It uses the assumption, that the replacement occurs in increasing position order in the original query.

The query string part to replace is identified by the startOffset and endOffset noted in the replacement token. These refer to the orignal character indexes in the query string.

If you want to replace multiple tokens left to right in a string and the sub strings to replace are given by indexes in the string, you have to be careful, if the length of the replacement string differs from the original substring. If the length gets longer, you have to increase the start and end indexes of all replacements to the right by the same amount, if it gets shorter, you have to decrease it. The code does exactly this calculation and cumulates the added increase and decrease in the variable offset.

So it is absolutly necessary, that the assumptions - replacements are done left to right - is correct.

 

Now the exception happens in the case, where multiple replacements (corrections) point to the same character position in the orginal query. How can that happen? By configuring the spellchecker with a queryAnalyzerFieldType which uses a synonym list in the query analyzer with the property, that it replaces a token by multiple tokens. That might not be the comon case, but it is possible.

Now searching for e.g. a single token in the original query can expand to multiple tokens by the synonym. All these refer to the same original query token, so use the same startOffset and endOffset. If the spell checker find corrections for more than one of these tokens, the collation code will try to replace the same original query token multiple times, always using the same startOffset and endOffset plus calculated offset. If the length changes, that is no longer correct.  Example:

 

original query: "myname"

synonym list: myname, some1 some2 myname, some3 some4 myname

Token list going into spell checker collation: myname, some1, some2, some3, some4

Assumed spelling corrections: some1 => some, some2=> some, some3=> some, some4=> some

Replacement happening on original query text "myname":
 * myname => some (startOffset 0, endOffset 6, offset 0; new offset -2 because replacement string is 2 chars shorter, so everything after the replacement would move 2 positions left)
 * some => exception (startOffset 0-2 = -2, endOffset 62 = 4,...)

It is a bit harder to reproduvce than that, because corrections with positionIncrement == 0 are completely ignored. I couldn't actually find out, when exactly this happens, but simple synonym lists often result in that value 0.

IMHO the code is wrong by making the assumption, that any replacement refers to a different token in the original query string. This is no longer true when synonym replacement by a query analyzer comes into play.

As a workaround, I find it correct to harden the code, altough it does not fix the root cause. If the replacement tokens are overlapping or not in order left to right, then the code should skip those out of order or overlapping replacements.

I will attach a suggested workaround patch.

I think, that using a queryAnalyzerFieldType in a spellchecker config with collation and with a field type that uses a query analyzer which can result in a list of tokens where multiple tokens refer to the same original string position in the query - like synonyms can - is simply not supported. That should be documented.

> StringIndexOutOfBoundsException: String index out of range: -3
> --------------------------------------------------------------
>
>                 Key: SOLR-13360
>                 URL: https://issues.apache.org/jira/browse/SOLR-13360
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 7.2.1
>         Environment: Solr 7.2.1 - SAP Hybris 6.7.0.8
>            Reporter: Ahmed Ghoneim
>            Priority: Critical
>         Attachments: managed-schema, managed-schema, resources.json, solr-config.zip
>
>
> *{color:#ff0000}I cannot execute the following query:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=true{noformat}
> 4/1/2019, 1:16:07 PM ERROR true RequestHandlerBase java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> 	at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
> 	at java.lang.StringBuilder.replace(StringBuilder.java:262)
> 	at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
> 	at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
> 	at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
> 	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> 	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
> 	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
> 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> 	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> 	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> 	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> 	at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> 	at org.eclipse.jetty.server.Server.handle(Server.java:534)
> 	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> 	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> 	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> 	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> 	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
> 	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> 	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> 	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> 	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
> 	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
> 	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> 4/1/2019, 1:16:07 PM ERROR true HttpSolrCall null:java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> null:java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> 	at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
> 	at java.lang.StringBuilder.replace(StringBuilder.java:262)
> 	at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
> 	at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
> 	at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
> 	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> 	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
> 	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
> 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> 	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> 	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> 	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> 	at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> 	at org.eclipse.jetty.server.Server.handle(Server.java:534)
> 	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> 	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> 	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> 	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> 	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
> 	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> 	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> 	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> 	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
> 	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
> 	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> 	at java.lang.Thread.run(Thread.java:748){code}
> *{color:#14892c}However the following query works:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=false{noformat}
> Note: there's a synonym
> {noformat}
> duotop -> Duo Top
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org