You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ankit Jain (Jira)" <ji...@apache.org> on 2022/02/24 17:01:00 UTC
[jira] [Comment Edited] (LUCENE-10428) getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge leading to busy threads in infinite loop
[ https://issues.apache.org/jira/browse/LUCENE-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497553#comment-17497553 ]
Ankit Jain edited comment on LUCENE-10428 at 2/24/22, 5:00 PM:
---------------------------------------------------------------
{quote}By any chance, were you able to see what is the number of clauses of this query?
{quote}
[~jpountz] - I did check the invocation of sumRelativeErrorBound and it probably showed 4.
Interestingly, even when I run the same query, it does not necessarily get into this convergence issue. So, could not find easy way to reproduce this from query level
was (Author: akjain):
{quote}By any chance, were you able to see what is the number of clauses of this query?{quote}
[~jpountz] - I did check the invocation of sumRelativeErrorBound and it probably showed 4.
> getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge leading to busy threads in infinite loop
> -----------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-10428
> URL: https://issues.apache.org/jira/browse/LUCENE-10428
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/query/scoring, core/search
> Reporter: Ankit Jain
> Priority: Major
> Attachments: Flame_graph.png
>
>
> Customers complained about high CPU for Elasticsearch cluster in production. We noticed that few search requests were stuck for long time
> {code:java}
> % curl -s localhost:9200/_cat/tasks?v
> indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205 AmMLzDQ4RrOJievRDeGFZw:569204 direct 1645195007282 14:36:47 6.2h
> indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075 emjWc5bUTG6lgnCGLulq-Q:502074 direct 1645195037259 14:37:17 6.2h
> indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270 emjWc5bUTG6lgnCGLulq-Q:583269 direct 1645201316981 16:21:56 4.5h
> {code}
> Flame graphs indicated that CPU time is mostly going into *getMinCompetitiveScore method in MaxScoreSumPropagator*. After doing some live JVM debugging found that org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had around 4 million invocations every second
> Figured out the values of some parameters from live debugging:
> {code:java}
> minScoreSum = 3.5541441
> minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = 3.554144322872162
> returnObj scoreSumUpperBound = 3.5541444
> Math.ulp(minScoreSum) = 2.3841858E-7
> {code}
> Example code snippet:
> {code:java}
> double sumOfOtherMaxScores = 3.554144322872162;
> double minScoreSum = 3.5541441;
> float minScore = (float) (minScoreSum - sumOfOtherMaxScores);
> while (scoreSumUpperBound(minScore + sumOfOtherMaxScores) > minScoreSum) {
> minScore -= Math.ulp(minScoreSum);
> System.out.printf("%.20f, %.100f\n", minScore, Math.ulp(minScoreSum));
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org