You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Christine Poerschke (Jira)" <ji...@apache.org> on 2021/01/06 18:56:00 UTC

[jira] [Commented] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

    [ https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259985#comment-17259985 ] 

Christine Poerschke commented on SOLR-15071:
--------------------------------------------

Thanks [~florin.babes] for creating this ticket as a continuation of the Solr user mailing list thread!

bq. ... We've narrowed our model to only two features and the problem always occurs (for some queries, not all) when we have a feature with a mm=1 and a feature with a mm>=3. The problem also occurs when we only do feature extraction and the problem seems to always occur on the feature with the bigger mm. The errors seem to be related to the size of the head DisiPriorityQueue created here: https://github.com/apache/lucene-solr/blob/branch_8_6/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L107 as the error changes as we change the mm for the second feature: ...

That really helps narrow things down, great. Thinking out aloud about possible further investigative steps:

The problem also happening during feature extraction should makes it easier to build a reproducible test case e.g. with the techproducts example used in the Solr Reference Guide https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html#quick-start-with-ltr i.e. given the documents in the techproducts example, is it possible to craft the dummy feature-store and choose mm values and queries that cause the same error?

{quote}
...
We have the following query raw parameters:
q=lg cx 4k oled 120 hz -> just of many examples
term_dq=lg cx 4k oled 120 hz
rq={!ltr model=model reRankDocs=1000 store=feature_store
efi.term=${term_dq}}
...
{quote}

You mention that the problem occurs for some queries but not all and that it seems to always occur for the bigger mm. Could the queries or the {{mm}} be adjusted to 'ensure' that the problem occurs? Specifically
* if we have (say) {{lg cx 4k oled 120 hz}} then that's 6 terms and so would mm=7 then 'guarantee' hitting of the error?
* might stopword removal be a factor e.g. if {{lg cx 4k oled 120 hz}} is pruned to (say) {{lg cx 4k oled hz}} then with only 5 terms left could mm>=6 ever work?

Not having used {{mm}} myself I had to lookup https://lucene.apache.org/solr/guide/8_6/the-dismax-query-parser.html#mm-minimum-should-match-parameter to learn a little about it and I also noticed https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/solrj/src/java/org/apache/solr/common/params/DisMaxParams.java#L45-L48 as sounding promising i.e. if {{mm.autoRelax=true}} was used would that avoid the errors?
{code}
/**
   * If set to true, will try to reduce MM if tokens are removed from some clauses but not all
   */
  public static String MM_AUTORELAX = "mm.autoRelax";
{code}

If the error can be reproduced with the techproducts example included in the 8.6.3 release itself then a next step could be to build Solr locally -- https://github.com/apache/lucene-solr/tree/master#building-solr -- and using the techproducts in the locally built Solr the code behaviour could be examined more closely to figure out exactly what's going on and how it might be fixed. Does that kind of make sense?

> Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-15071
>                 URL: https://issues.apache.org/jira/browse/SOLR-15071
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LTR
>    Affects Versions: 8.6, 8.7
>            Reporter: Florin Babes
>            Priority: Major
>              Labels: ltr
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerC
> ollection.java:221)
> at
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java
> :177)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java
> :146)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.eclipse.jetty.server.Server.handle(Server.java:500)
> at
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnectio
> n.java:311)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.jav
> a:336)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.j
> ava:313)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.
> java:171)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:12
> 9)
> at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(Reserved
> ThreadExecutor.java:375)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:
> 938)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2
> at
> org.apache.lucene.search.DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> at
> org.apache.lucene.search.MinShouldMatchSumScorer.advanceTail(MinShouldMatchSumSc
> orer.java:246)
> at
> org.apache.lucene.search.MinShouldMatchSumScorer.updateFreq(MinShouldMatchSumSco
> rer.java:312)
> at
> org.apache.lucene.search.MinShouldMatchSumScorer.score(MinShouldMatchSumScorer.j
> ava:320)
> at
> org.apache.solr.ltr.feature.SolrFeature$SolrFeatureWeight$SolrFeatureScorer.scor
> e(SolrFeature.java:242)
> at
> org.apache.solr.ltr.LTRScoringQuery$ModelWeight$ModelScorer$SparseModelScorer.sc
> ore(LTRScoringQuery.java:595)
> at
> org.apache.solr.ltr.LTRScoringQuery$ModelWeight$ModelScorer.score(LTRScoringQuer
> y.java:540)
> at org.apache.solr.ltr.LTRRescorer.scoreFeatures(LTRRescorer.java:183)
> at org.apache.solr.ltr.LTRRescorer.rescore(LTRRescorer.java:122)
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:119)
>  
> Based on [~cpoerschke] suggestions we detailed the following:
> Reproducibility: We can reproduce the same queries on multiple runs, with the same error.
> Data as a factor: Our setup is single-sharded, so we can't investigate further on this.
> Feature vs. Model: We've also tried a dummy LinearModel with only two features and the problem still occurs.
> Identification of the troublesome feature(s): We've narrowed our model to only two features and the problem always occurs (for some queries, not all) when we have a feature with a mm=1 and a feature with a mm>=3. The problem also occurs when we only do feature extraction and the problem seems to always occur on the feature with the bigger mm. The errors seem to be related to the size of the head DisiPriorityQueue created here: https://github.com/apache/lucene-solr/blob/branch_8_6/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L107 as the error changes as we change the mm for the second feature:
>  
> 1 feature with mm=1 and one with mm=3 -> Index 4 out of bounds for length 4
> 1 feature with mm=1 and one with mm=5 -> Index 2 out of bounds for length 2
>  
> You can find below the dummy feature-store.
>  
> [
>  {
>  "store": "dummystore",
>  "name": "similarity_name_mm_1",
>  "class": "org.apache.solr.ltr.feature.SolrFeature",
>  "params": {
>  "q": "\{!dismax qf=name mm=1}${term}"
>  }
>  },
>  {
>  "store": "dummystore",
>  "name": "similarity_names_mm_3",
>  "class": "org.apache.solr.ltr.feature.SolrFeature",
>  "params": {
>  "q": "\{!dismax qf=name mm=3}${term}"
>  }
>  }
> ]
>  
> The problem starts occuring in Solr 8.6.0, as we tried multiple versions < 8.6 and >= 8.6 and the problem started on 8.6.0 and we tend to believe it's because of the following changes: https://issues.apache.org/jira/browse/SOLR-14364 as they're the only major changes related to LTR which were introduced in Solr 8.6.0.
>  
> Thanks.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org