You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ankit Jain (Jira)" <ji...@apache.org> on 2022/02/24 17:01:00 UTC

[jira] [Comment Edited] (LUCENE-10428) getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge leading to busy threads in infinite loop

    [ https://issues.apache.org/jira/browse/LUCENE-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497553#comment-17497553 ] 

Ankit Jain edited comment on LUCENE-10428 at 2/24/22, 5:00 PM:
---------------------------------------------------------------

{quote}By any chance, were you able to see what is the number of clauses of this query?
{quote}
[~jpountz] - I did check the invocation of sumRelativeErrorBound and it probably showed 4.

Interestingly, even when I run the same query, it does not necessarily get into this convergence issue. So, could not find easy way to reproduce this from query level


was (Author: akjain):
{quote}By any chance, were you able to see what is the number of clauses of this query?{quote}

[~jpountz] - I did check the invocation of sumRelativeErrorBound and it probably showed 4.

> getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge leading to busy threads in infinite loop
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10428
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10428
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/query/scoring, core/search
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: Flame_graph.png
>
>
> Customers complained about high CPU for Elasticsearch cluster in production. We noticed that few search requests were stuck for long time
> {code:java}
> % curl -s localhost:9200/_cat/tasks?v                               
> indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205  AmMLzDQ4RrOJievRDeGFZw:569204  direct    1645195007282 14:36:47  6.2h
> indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075  emjWc5bUTG6lgnCGLulq-Q:502074  direct    1645195037259 14:37:17  6.2h
> indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270  emjWc5bUTG6lgnCGLulq-Q:583269  direct    1645201316981 16:21:56  4.5h
> {code}
> Flame graphs indicated that CPU time is mostly going into *getMinCompetitiveScore method in MaxScoreSumPropagator*. After doing some live JVM debugging found that org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had around 4 million invocations every second
> Figured out the values of some parameters from live debugging:
> {code:java}
> minScoreSum = 3.5541441
> minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = 3.554144322872162
> returnObj scoreSumUpperBound = 3.5541444
> Math.ulp(minScoreSum) = 2.3841858E-7
> {code}
> Example code snippet:
> {code:java}
> double sumOfOtherMaxScores = 3.554144322872162;
> double minScoreSum = 3.5541441;
> float minScore = (float) (minScoreSum - sumOfOtherMaxScores);
> while (scoreSumUpperBound(minScore + sumOfOtherMaxScores) > minScoreSum) {
>     minScore -= Math.ulp(minScoreSum);
>     System.out.printf("%.20f, %.100f\n", minScore, Math.ulp(minScoreSum));
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org