You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2017/08/21 17:16:00 UTC
[jira] [Updated] (LUCENE-7939) Speed up MinShouldMatchSumScorer in
conjunctions
[ https://issues.apache.org/jira/browse/LUCENE-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-7939:
---------------------------------
Attachment: LUCENE-7939.patch
Here is a proposal. I ran a benchmark with the following tasks (taken from LUCENE-4571):
{noformat}
HighMinShouldMatch4: ref http from name title +minShouldMatch=4
HighMinShouldMatch3: ref http from name title +minShouldMatch=3
HighMinShouldMatch2: ref http from name title +minShouldMatch=2
HighMinShouldMatch0: ref http from name title
Low1MinShouldMatch4: ref http from name dublin +minShouldMatch=4
Low1MinShouldMatch3: ref http from name dublin +minShouldMatch=3
Low1MinShouldMatch2: ref http from name dublin +minShouldMatch=2
Low1MinShouldMatch0: ref http from name dublin
Low2MinShouldMatch4: ref http from wings dublin +minShouldMatch=4
Low2MinShouldMatch3: ref http from wings dublin +minShouldMatch=3
Low2MinShouldMatch2: ref http from wings dublin +minShouldMatch=2
Low2MinShouldMatch0: ref http from wings dublin
Low3MinShouldMatch4: ref http struck wings dublin +minShouldMatch=4
Low3MinShouldMatch3: ref http struck wings dublin +minShouldMatch=3
Low3MinShouldMatch2: ref http struck wings dublin +minShouldMatch=2
Low3MinShouldMatch0: ref http struck wings dublin
Low4MinShouldMatch4: ref restored struck wings dublin +minShouldMatch=4
Low4MinShouldMatch3: ref restored struck wings dublin +minShouldMatch=3
Low4MinShouldMatch2: ref restored struck wings dublin +minShouldMatch=2
Low4MinShouldMatch0: ref restored struck wings dublin
ConjHighMinShouldMatch4: +only ref http from name title +minShouldMatch=4
ConjHighMinShouldMatch3: +only ref http from name title +minShouldMatch=3
ConjHighMinShouldMatch2: +only ref http from name title +minShouldMatch=2
ConjHighMinShouldMatch0: +only ref http from name title
ConjLow1MinShouldMatch4: +only ref http from name dublin +minShouldMatch=4
ConjLow1MinShouldMatch3: +only ref http from name dublin +minShouldMatch=3
ConjLow1MinShouldMatch2: +only ref http from name dublin +minShouldMatch=2
ConjLow1MinShouldMatch0: +only ref http from name dublin
ConjLow2MinShouldMatch4: +only ref http from wings dublin +minShouldMatch=4
ConjLow2MinShouldMatch3: +only ref http from wings dublin +minShouldMatch=3
ConjLow2MinShouldMatch2: +only ref http from wings dublin +minShouldMatch=2
ConjLow2MinShouldMatch0: +only ref http from wings dublin
ConjLow3MinShouldMatch4: +only ref http struck wings dublin +minShouldMatch=4
ConjLow3MinShouldMatch3: +only ref http struck wings dublin +minShouldMatch=3
ConjLow3MinShouldMatch2: +only ref http struck wings dublin +minShouldMatch=2
ConjLow3MinShouldMatch0: +only ref http struck wings dublin
ConjLow4MinShouldMatch4: +only ref restored struck wings dublin +minShouldMatch=4
ConjLow4MinShouldMatch3: +only ref restored struck wings dublin +minShouldMatch=3
ConjLow4MinShouldMatch2: +only ref restored struck wings dublin +minShouldMatch=2
ConjLow4MinShouldMatch0: +only ref restored struck wings dublin
{noformat}
As you can see, there are two versions for each task, the original one and one that starts with {Conj} because the query is intersected with {{+only}}.
luceneutil gave me the following numbers on wikimedium10m:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev Pct diff
ConjLow4MinShouldMatch3 230.51 (2.8%) 223.38 (2.0%) -3.1% ( -7% - 1%)
Low2MinShouldMatch3 5.04 (2.6%) 4.92 (2.0%) -2.3% ( -6% - 2%)
Low1MinShouldMatch3 3.80 (3.3%) 3.76 (4.8%) -1.2% ( -8% - 7%)
HighMinShouldMatch3 3.27 (3.6%) 3.23 (4.9%) -1.2% ( -9% - 7%)
Low1MinShouldMatch0 3.72 (4.7%) 3.68 (6.0%) -1.1% ( -11% - 10%)
HighMinShouldMatch0 3.24 (4.5%) 3.21 (5.7%) -1.0% ( -10% - 9%)
Low2MinShouldMatch2 4.44 (3.8%) 4.39 (4.9%) -1.0% ( -9% - 7%)
Low1MinShouldMatch2 3.63 (3.9%) 3.60 (4.9%) -1.0% ( -9% - 8%)
Low2MinShouldMatch0 4.55 (5.1%) 4.51 (5.9%) -0.9% ( -11% - 10%)
HighMinShouldMatch2 3.17 (4.1%) 3.14 (5.0%) -0.9% ( -9% - 8%)
Low1MinShouldMatch4 5.11 (2.2%) 5.06 (2.1%) -0.8% ( -5% - 3%)
Low3MinShouldMatch2 6.23 (3.4%) 6.18 (4.3%) -0.8% ( -8% - 7%)
HighMinShouldMatch4 3.42 (3.7%) 3.40 (5.0%) -0.6% ( -8% - 8%)
Low4MinShouldMatch0 10.67 (6.1%) 10.61 (6.9%) -0.6% ( -12% - 13%)
Low4MinShouldMatch2 42.49 (1.5%) 42.25 (1.4%) -0.6% ( -3% - 2%)
Low3MinShouldMatch0 6.32 (5.2%) 6.29 (6.4%) -0.5% ( -11% - 11%)
Low3MinShouldMatch3 38.64 (1.9%) 38.46 (1.4%) -0.5% ( -3% - 2%)
Low4MinShouldMatch3 241.97 (3.9%) 240.87 (3.7%) -0.5% ( -7% - 7%)
Low2MinShouldMatch4 45.66 (1.3%) 45.58 (1.5%) -0.2% ( -2% - 2%)
ConjHighMinShouldMatch0 5.29 (4.0%) 5.29 (4.7%) -0.1% ( -8% - 9%)
ConjLow1MinShouldMatch0 6.15 (3.8%) 6.16 (4.3%) 0.2% ( -7% - 8%)
Low4MinShouldMatch4 415.43 (2.2%) 416.29 (2.8%) 0.2% ( -4% - 5%)
ConjLow4MinShouldMatch0 16.86 (1.1%) 16.91 (1.9%) 0.2% ( -2% - 3%)
ConjLow3MinShouldMatch0 10.83 (2.4%) 10.86 (2.7%) 0.3% ( -4% - 5%)
ConjLow2MinShouldMatch0 7.60 (3.5%) 7.64 (3.8%) 0.4% ( -6% - 8%)
Low3MinShouldMatch4 389.80 (3.4%) 392.00 (2.4%) 0.6% ( -5% - 6%)
ConjLow3MinShouldMatch4 372.17 (4.2%) 374.75 (2.2%) 0.7% ( -5% - 7%)
ConjLow4MinShouldMatch4 406.38 (5.0%) 415.00 (3.2%) 2.1% ( -5% - 10%)
ConjHighMinShouldMatch2 4.78 (2.0%) 4.93 (2.0%) 3.3% ( 0% - 7%)
ConjLow1MinShouldMatch2 5.49 (2.0%) 5.74 (1.7%) 4.4% ( 0% - 8%)
ConjLow2MinShouldMatch2 7.08 (1.8%) 7.62 (2.0%) 7.7% ( 3% - 11%)
ConjLow3MinShouldMatch2 10.90 (1.9%) 12.60 (2.3%) 15.6% ( 11% - 20%)
ConjHighMinShouldMatch3 4.77 (2.7%) 5.63 (2.8%) 18.0% ( 12% - 24%)
ConjLow2MinShouldMatch4 43.82 (1.7%) 54.61 (2.3%) 24.6% ( 20% - 29%)
ConjLow1MinShouldMatch3 5.60 (3.3%) 7.13 (3.0%) 27.3% ( 20% - 34%)
ConjLow4MinShouldMatch2 40.49 (1.2%) 54.81 (2.6%) 35.4% ( 31% - 39%)
ConjLow3MinShouldMatch3 36.65 (1.6%) 50.97 (2.5%) 39.1% ( 34% - 43%)
ConjLow2MinShouldMatch3 7.79 (3.2%) 10.86 (3.2%) 39.5% ( 32% - 47%)
ConjHighMinShouldMatch4 4.98 (3.0%) 7.31 (3.4%) 46.7% ( 39% - 54%)
ConjLow1MinShouldMatch4 6.58 (3.3%) 11.20 (3.3%) 70.3% ( 61% - 79%
{noformat}
> Speed up MinShouldMatchSumScorer in conjunctions
> ------------------------------------------------
>
> Key: LUCENE-7939
> URL: https://issues.apache.org/jira/browse/LUCENE-7939
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7939.patch
>
>
> MinShouldMatchSumScorer has good iteration capabilities, but if it is not used as a lead for the iteration then the advance() call might make a lot of efforts in order to find the next match while we should instead let the lead iterator of the conjunction skip over non-matching documents. In this issue I'd like to explore changing MinShouldMatchSumScorer by giving it a two-phase iterator and making advance() return a candidate for the next match that is less good but much cheaper to compute.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org