You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2017/08/21 17:16:00 UTC
[jira] [Updated] (LUCENE-7939) Speed up MinShouldMatchSumScorer in conjunctions

     [ https://issues.apache.org/jira/browse/LUCENE-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-7939:
---------------------------------
    Attachment: LUCENE-7939.patch

Here is a proposal. I ran a benchmark with the following tasks (taken from LUCENE-4571):

{noformat}
HighMinShouldMatch4: ref http from name title +minShouldMatch=4
HighMinShouldMatch3: ref http from name title +minShouldMatch=3
HighMinShouldMatch2: ref http from name title +minShouldMatch=2
HighMinShouldMatch0: ref http from name title
Low1MinShouldMatch4: ref http from name dublin +minShouldMatch=4
Low1MinShouldMatch3: ref http from name dublin +minShouldMatch=3
Low1MinShouldMatch2: ref http from name dublin +minShouldMatch=2
Low1MinShouldMatch0: ref http from name dublin
Low2MinShouldMatch4: ref http from wings dublin +minShouldMatch=4
Low2MinShouldMatch3: ref http from wings dublin +minShouldMatch=3
Low2MinShouldMatch2: ref http from wings dublin +minShouldMatch=2
Low2MinShouldMatch0: ref http from wings dublin
Low3MinShouldMatch4: ref http struck wings dublin +minShouldMatch=4
Low3MinShouldMatch3: ref http struck wings dublin +minShouldMatch=3
Low3MinShouldMatch2: ref http struck wings dublin +minShouldMatch=2
Low3MinShouldMatch0: ref http struck wings dublin
Low4MinShouldMatch4: ref restored struck wings dublin +minShouldMatch=4
Low4MinShouldMatch3: ref restored struck wings dublin +minShouldMatch=3
Low4MinShouldMatch2: ref restored struck wings dublin +minShouldMatch=2
Low4MinShouldMatch0: ref restored struck wings dublin
ConjHighMinShouldMatch4: +only ref http from name title +minShouldMatch=4
ConjHighMinShouldMatch3: +only ref http from name title +minShouldMatch=3
ConjHighMinShouldMatch2: +only ref http from name title +minShouldMatch=2
ConjHighMinShouldMatch0: +only ref http from name title
ConjLow1MinShouldMatch4: +only ref http from name dublin +minShouldMatch=4
ConjLow1MinShouldMatch3: +only ref http from name dublin +minShouldMatch=3
ConjLow1MinShouldMatch2: +only ref http from name dublin +minShouldMatch=2
ConjLow1MinShouldMatch0: +only ref http from name dublin
ConjLow2MinShouldMatch4: +only ref http from wings dublin +minShouldMatch=4
ConjLow2MinShouldMatch3: +only ref http from wings dublin +minShouldMatch=3
ConjLow2MinShouldMatch2: +only ref http from wings dublin +minShouldMatch=2
ConjLow2MinShouldMatch0: +only ref http from wings dublin
ConjLow3MinShouldMatch4: +only ref http struck wings dublin +minShouldMatch=4
ConjLow3MinShouldMatch3: +only ref http struck wings dublin +minShouldMatch=3
ConjLow3MinShouldMatch2: +only ref http struck wings dublin +minShouldMatch=2
ConjLow3MinShouldMatch0: +only ref http struck wings dublin
ConjLow4MinShouldMatch4: +only ref restored struck wings dublin +minShouldMatch=4
ConjLow4MinShouldMatch3: +only ref restored struck wings dublin +minShouldMatch=3
ConjLow4MinShouldMatch2: +only ref restored struck wings dublin +minShouldMatch=2
ConjLow4MinShouldMatch0: +only ref restored struck wings dublin
{noformat}

As you can see, there are two versions for each task, the original one and one that starts with {Conj} because the query is intersected with {{+only}}.

luceneutil gave me the following numbers on wikimedium10m:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
 ConjLow4MinShouldMatch3      230.51      (2.8%)      223.38      (2.0%)   -3.1% (  -7% -    1%)
     Low2MinShouldMatch3        5.04      (2.6%)        4.92      (2.0%)   -2.3% (  -6% -    2%)
     Low1MinShouldMatch3        3.80      (3.3%)        3.76      (4.8%)   -1.2% (  -8% -    7%)
     HighMinShouldMatch3        3.27      (3.6%)        3.23      (4.9%)   -1.2% (  -9% -    7%)
     Low1MinShouldMatch0        3.72      (4.7%)        3.68      (6.0%)   -1.1% ( -11% -   10%)
     HighMinShouldMatch0        3.24      (4.5%)        3.21      (5.7%)   -1.0% ( -10% -    9%)
     Low2MinShouldMatch2        4.44      (3.8%)        4.39      (4.9%)   -1.0% (  -9% -    7%)
     Low1MinShouldMatch2        3.63      (3.9%)        3.60      (4.9%)   -1.0% (  -9% -    8%)
     Low2MinShouldMatch0        4.55      (5.1%)        4.51      (5.9%)   -0.9% ( -11% -   10%)
     HighMinShouldMatch2        3.17      (4.1%)        3.14      (5.0%)   -0.9% (  -9% -    8%)
     Low1MinShouldMatch4        5.11      (2.2%)        5.06      (2.1%)   -0.8% (  -5% -    3%)
     Low3MinShouldMatch2        6.23      (3.4%)        6.18      (4.3%)   -0.8% (  -8% -    7%)
     HighMinShouldMatch4        3.42      (3.7%)        3.40      (5.0%)   -0.6% (  -8% -    8%)
     Low4MinShouldMatch0       10.67      (6.1%)       10.61      (6.9%)   -0.6% ( -12% -   13%)
     Low4MinShouldMatch2       42.49      (1.5%)       42.25      (1.4%)   -0.6% (  -3% -    2%)
     Low3MinShouldMatch0        6.32      (5.2%)        6.29      (6.4%)   -0.5% ( -11% -   11%)
     Low3MinShouldMatch3       38.64      (1.9%)       38.46      (1.4%)   -0.5% (  -3% -    2%)
     Low4MinShouldMatch3      241.97      (3.9%)      240.87      (3.7%)   -0.5% (  -7% -    7%)
     Low2MinShouldMatch4       45.66      (1.3%)       45.58      (1.5%)   -0.2% (  -2% -    2%)
 ConjHighMinShouldMatch0        5.29      (4.0%)        5.29      (4.7%)   -0.1% (  -8% -    9%)
 ConjLow1MinShouldMatch0        6.15      (3.8%)        6.16      (4.3%)    0.2% (  -7% -    8%)
     Low4MinShouldMatch4      415.43      (2.2%)      416.29      (2.8%)    0.2% (  -4% -    5%)
 ConjLow4MinShouldMatch0       16.86      (1.1%)       16.91      (1.9%)    0.2% (  -2% -    3%)
 ConjLow3MinShouldMatch0       10.83      (2.4%)       10.86      (2.7%)    0.3% (  -4% -    5%)
 ConjLow2MinShouldMatch0        7.60      (3.5%)        7.64      (3.8%)    0.4% (  -6% -    8%)
     Low3MinShouldMatch4      389.80      (3.4%)      392.00      (2.4%)    0.6% (  -5% -    6%)
 ConjLow3MinShouldMatch4      372.17      (4.2%)      374.75      (2.2%)    0.7% (  -5% -    7%)
 ConjLow4MinShouldMatch4      406.38      (5.0%)      415.00      (3.2%)    2.1% (  -5% -   10%)
 ConjHighMinShouldMatch2        4.78      (2.0%)        4.93      (2.0%)    3.3% (   0% -    7%)
 ConjLow1MinShouldMatch2        5.49      (2.0%)        5.74      (1.7%)    4.4% (   0% -    8%)
 ConjLow2MinShouldMatch2        7.08      (1.8%)        7.62      (2.0%)    7.7% (   3% -   11%)
 ConjLow3MinShouldMatch2       10.90      (1.9%)       12.60      (2.3%)   15.6% (  11% -   20%)
 ConjHighMinShouldMatch3        4.77      (2.7%)        5.63      (2.8%)   18.0% (  12% -   24%)
 ConjLow2MinShouldMatch4       43.82      (1.7%)       54.61      (2.3%)   24.6% (  20% -   29%)
 ConjLow1MinShouldMatch3        5.60      (3.3%)        7.13      (3.0%)   27.3% (  20% -   34%)
 ConjLow4MinShouldMatch2       40.49      (1.2%)       54.81      (2.6%)   35.4% (  31% -   39%)
 ConjLow3MinShouldMatch3       36.65      (1.6%)       50.97      (2.5%)   39.1% (  34% -   43%)
 ConjLow2MinShouldMatch3        7.79      (3.2%)       10.86      (3.2%)   39.5% (  32% -   47%)
 ConjHighMinShouldMatch4        4.98      (3.0%)        7.31      (3.4%)   46.7% (  39% -   54%)
 ConjLow1MinShouldMatch4        6.58      (3.3%)       11.20      (3.3%)   70.3% (  61% -   79%
{noformat}

> Speed up MinShouldMatchSumScorer in conjunctions
> ------------------------------------------------
>
>                 Key: LUCENE-7939
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7939
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: master (8.0), 7.1
>
>         Attachments: LUCENE-7939.patch
>
>
> MinShouldMatchSumScorer has good iteration capabilities, but if it is not used as a lead for the iteration then the advance() call might make a lot of efforts in order to find the next match while we should instead let the lead iterator of the conjunction skip over non-matching documents. In this issue I'd like to explore changing MinShouldMatchSumScorer by giving it a two-phase iterator and making advance() return a candidate for the next match that is less good but much cheaper to compute.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org