You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/03/24 21:31:15 UTC
[jira] [Updated] (LUCENE-4872) BooleanWeight should decide how to
execute minNrShouldMatch
[ https://issues.apache.org/jira/browse/LUCENE-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-4872:
---------------------------------------
Attachment: crazyMinShouldMatch.tasks
I generated some random biggish minShouldMatch queries (attached), and then ran luceneutil perf test where base=trunk (always use BS2 if minShouldMatch > 1) and comp= always use BS1:
{noformat}
Task QPS base StdDev QPS comp StdDev Pct diff
20Terms15High20MSM 273.33 (2.5%) 0.87 (0.0%) -99.7% ( -99% - -99%)
25Terms16High25MSM 328.44 (3.1%) 1.41 (0.0%) -99.6% ( -99% - -99%)
15Terms7High15MSM 378.55 (2.5%) 1.66 (0.0%) -99.6% ( -99% - -99%)
15Terms12High15MSM 437.86 (3.1%) 1.97 (0.0%) -99.6% ( -99% - -99%)
25Terms8High25MSM 304.31 (3.0%) 1.46 (0.0%) -99.5% ( -99% - -99%)
10Terms7High10MSM 560.31 (2.2%) 3.35 (0.0%) -99.4% ( -99% - -99%)
15Terms11High15MSM 529.42 (3.0%) 3.98 (0.0%) -99.2% ( -99% - -99%)
20Terms11High19MSM 214.71 (4.7%) 2.22 (0.0%) -99.0% ( -99% - -98%)
20Terms15High19MSM 138.97 (5.7%) 1.46 (0.1%) -98.9% ( -99% - -98%)
20Terms10High19MSM 377.77 (3.3%) 4.59 (0.1%) -98.8% ( -98% - -98%)
25Terms10High24MSM 101.62 (5.7%) 1.31 (0.1%) -98.7% ( -98% - -98%)
20Terms8High19MSM 108.52 (5.9%) 1.55 (0.1%) -98.6% ( -98% - -98%)
5Terms2High5MSM 383.99 (1.4%) 5.48 (0.1%) -98.6% ( -98% - -98%)
5Terms3High5MSM 355.19 (1.7%) 5.14 (0.1%) -98.6% ( -98% - -98%)
25Terms14High24MSM 234.80 (3.8%) 3.43 (0.1%) -98.5% ( -98% - -98%)
10Terms4High10MSM 802.64 (2.4%) 11.76 (0.1%) -98.5% ( -98% - -98%)
15Terms11High14MSM 252.84 (5.0%) 4.07 (0.1%) -98.4% ( -98% - -98%)
15Terms8High14MSM 286.52 (4.8%) 4.94 (0.1%) -98.3% ( -98% - -98%)
25Terms16High22MSM 83.95 (5.6%) 1.46 (0.1%) -98.3% ( -98% - -98%)
15Terms10High13MSM 200.36 (4.9%) 4.72 (0.1%) -97.6% ( -97% - -97%)
25Terms2High25MSM 400.74 (2.9%) 9.84 (0.1%) -97.5% ( -97% - -97%)
10Terms1High10MSM 443.03 (1.9%) 11.48 (0.1%) -97.4% ( -97% - -97%)
25Terms7High23MSM 119.41 (4.9%) 3.38 (0.1%) -97.2% ( -97% - -96%)
5Terms1High5MSM 791.03 (1.5%) 22.66 (0.1%) -97.1% ( -97% - -96%)
25Terms6High22MSM 65.43 (6.5%) 2.16 (0.2%) -96.7% ( -97% - -96%)
15Terms8High11MSM 49.38 (7.8%) 1.67 (0.2%) -96.6% ( -97% - -96%)
15Terms7High12MSM 87.96 (5.9%) 3.55 (0.2%) -96.0% ( -96% - -95%)
25Terms10High18MSM 55.03 (6.6%) 2.30 (0.2%) -95.8% ( -96% - -95%)
10Terms4High8MSM 133.95 (5.9%) 5.67 (0.2%) -95.8% ( -96% - -95%)
10Terms5High7MSM 62.52 (7.0%) 2.71 (0.2%) -95.7% ( -96% - -95%)
10Terms3High8MSM 76.74 (6.8%) 3.51 (0.2%) -95.4% ( -95% - -94%)
15Terms7High11MSM 73.30 (6.3%) 3.43 (0.2%) -95.3% ( -95% - -94%)
5Terms4High5MSM 310.05 (2.4%) 14.54 (0.2%) -95.3% ( -95% - -94%)
15Terms3High13MSM 256.05 (4.5%) 12.36 (0.2%) -95.2% ( -95% - -94%)
25Terms9High19MSM 103.25 (5.3%) 5.02 (0.2%) -95.1% ( -95% - -94%)
25Terms6High18MSM 36.26 (8.2%) 2.17 (0.3%) -94.0% ( -94% - -93%)
25Terms14High15MSM 27.13 (5.4%) 1.70 (0.3%) -93.7% ( -94% - -93%)
15Terms4High9MSM 43.14 (8.0%) 2.70 (0.3%) -93.7% ( -94% - -92%)
5Terms2High4MSM 82.68 (7.1%) 5.65 (0.3%) -93.2% ( -93% - -92%)
15Terms6High9MSM 55.67 (7.0%) 4.01 (0.3%) -92.8% ( -93% - -91%)
15Terms3High12MSM 62.05 (7.0%) 4.51 (0.3%) -92.7% ( -93% - -91%)
20Terms0High20MSM 597.72 (1.7%) 43.89 (0.2%) -92.7% ( -92% - -92%)
10Terms3High6MSM 56.10 (7.5%) 4.13 (0.3%) -92.6% ( -93% - -91%)
20Terms6High13MSM 55.80 (6.8%) 4.14 (0.3%) -92.6% ( -93% - -91%)
10Terms4High6MSM 72.64 (6.5%) 5.95 (0.4%) -91.8% ( -92% - -90%)
20Terms6High12MSM 51.01 (6.9%) 4.28 (0.4%) -91.6% ( -92% - -90%)
20Terms8High10MSM 17.80 (7.4%) 1.53 (0.4%) -91.4% ( -92% - -90%)
20Terms5High14MSM 87.56 (5.5%) 7.74 (0.4%) -91.2% ( -92% - -90%)
15Terms2High13MSM 119.82 (5.5%) 11.57 (0.4%) -90.3% ( -91% - -89%)
15Terms0High15MSM 548.28 (2.9%) 56.01 (0.3%) -89.8% ( -90% - -89%)
20Terms3High15MSM 112.66 (5.2%) 11.93 (0.5%) -89.4% ( -90% - -88%)
20Terms17High17MSM 23.13 (8.8%) 2.55 (0.5%) -89.0% ( -90% - -87%)
20Terms3High13MSM 35.37 (8.4%) 4.09 (0.5%) -88.4% ( -89% - -86%)
15Terms1High13MSM 204.25 (4.7%) 26.27 (0.5%) -87.1% ( -88% - -85%)
25Terms9High10MSM 33.16 (5.3%) 4.59 (0.6%) -86.2% ( -87% - -84%)
15Terms2High11MSM 89.91 (5.8%) 12.49 (0.6%) -86.1% ( -87% - -84%)
5Terms3High4MSM 30.54 (3.8%) 4.35 (0.7%) -85.8% ( -86% - -84%)
5Terms0High5MSM 843.58 (1.8%) 121.72 (0.6%) -85.6% ( -86% - -84%)
5Terms1High4MSM 292.58 (5.9%) 45.87 (0.7%) -84.3% ( -85% - -82%)
10Terms4High5MSM 37.67 (4.9%) 5.94 (0.8%) -84.2% ( -85% - -82%)
15Terms12High12MSM 11.43 (9.4%) 1.90 (0.8%) -83.4% ( -85% - -80%)
15Terms3High6MSM 43.32 (7.4%) 7.50 (0.8%) -82.7% ( -84% - -80%)
15Terms10High10MSM 12.00 (8.4%) 2.14 (0.8%) -82.2% ( -84% - -79%)
10Terms2High5MSM 41.81 (8.2%) 7.54 (0.9%) -82.0% ( -84% - -79%)
15Terms2High7MSM 31.31 (8.4%) 5.83 (0.9%) -81.4% ( -83% - -78%)
20Terms3High9MSM 61.79 (6.0%) 12.05 (0.8%) -80.5% ( -82% - -78%)
10Terms1High6MSM 55.03 (7.8%) 11.26 (1.0%) -79.5% ( -81% - -76%)
5Terms1High3MSM 73.74 (6.4%) 15.42 (1.0%) -79.1% ( -81% - -76%)
15Terms2High6MSM 44.95 (7.2%) 10.28 (1.1%) -77.1% ( -79% - -74%)
5Terms2High3MSM 24.52 (3.8%) 5.78 (1.1%) -76.4% ( -78% - -74%)
20Terms1High13MSM 58.74 (6.4%) 16.13 (1.2%) -72.5% ( -75% - -69%)
25Terms4High7MSM 37.41 (6.6%) 10.40 (1.2%) -72.2% ( -75% - -68%)
15Terms1High9MSM 60.48 (6.8%) 17.85 (1.3%) -70.5% ( -73% - -66%)
15Terms1High5MSM 22.75 (8.5%) 7.22 (1.5%) -68.3% ( -72% - -63%)
20Terms2High8MSM 18.82 (8.2%) 6.33 (1.5%) -66.4% ( -70% - -61%)
20Terms1High11MSM 50.55 (7.0%) 19.27 (1.6%) -61.9% ( -65% - -57%)
10Terms6High6MSM 4.99 (7.3%) 2.02 (2.0%) -59.4% ( -64% - -53%)
5Terms1High2MSM 73.14 (3.2%) 31.07 (2.1%) -57.5% ( -60% - -53%)
20Terms4High5MSM 8.08 (5.7%) 3.47 (2.0%) -57.0% ( -61% - -52%)
10Terms2High3MSM 14.55 (5.1%) 6.40 (2.1%) -56.0% ( -60% - -51%)
15Terms0High11MSM 177.51 (5.1%) 78.53 (1.4%) -55.8% ( -59% - -51%)
15Terms9High8MSM 9.83 (9.1%) 4.72 (2.3%) -51.9% ( -58% - -44%)
20Terms13High11MSM 7.18 (9.0%) 3.51 (2.4%) -51.1% ( -57% - -43%)
10Terms0High8MSM 142.40 (5.4%) 78.02 (1.9%) -45.2% ( -49% - -39%)
10Terms4High4MSM 19.21 (8.9%) 11.12 (2.8%) -42.1% ( -49% - -33%)
15Terms1High4MSM 19.08 (8.4%) 11.44 (2.8%) -40.0% ( -47% - -31%)
5Terms0High4MSM 207.03 (5.9%) 124.76 (2.5%) -39.7% ( -45% - -33%)
15Terms7High6MSM 5.02 (8.7%) 3.14 (3.0%) -37.5% ( -45% - -28%)
20Terms15High12MSM 1.44 (7.1%) 0.90 (3.6%) -37.2% ( -44% - -28%)
10Terms9High7MSM 8.39 (9.1%) 5.40 (3.1%) -35.7% ( -43% - -25%)
15Terms13High10MSM 2.70 (8.5%) 1.75 (3.0%) -35.2% ( -43% - -25%)
20Terms0High14MSM 42.75 (8.0%) 28.29 (2.4%) -33.8% ( -40% - -25%)
25Terms21High16MSM 1.68 (7.6%) 1.12 (3.4%) -33.4% ( -41% - -24%)
20Terms1High4MSM 45.87 (6.2%) 31.85 (2.6%) -30.6% ( -37% - -23%)
5Terms4High4MSM 4.03 (8.6%) 2.93 (3.5%) -27.2% ( -36% - -16%)
5Terms0High3MSM 211.23 (5.6%) 172.13 (3.4%) -18.5% ( -26% - -10%)
5Terms3High3MSM 9.27 (9.8%) 7.81 (4.2%) -15.7% ( -26% - -1%)
15Terms14High10MSM 1.11 (7.0%) 0.97 (5.0%) -12.9% ( -23% - 0%)
20Terms17High11MSM 2.96 (8.4%) 2.71 (4.3%) -8.5% ( -19% - 4%)
20Terms3High3MSM 14.55 (7.9%) 13.45 (4.1%) -7.6% ( -18% - 4%)
10Terms0High6MSM 78.35 (6.4%) 73.87 (3.3%) -5.7% ( -14% - 4%)
15Terms0High8MSM 56.41 (6.8%) 53.67 (3.1%) -4.8% ( -13% - 5%)
25Terms14High10MSM 0.95 (6.6%) 0.93 (5.4%) -2.3% ( -13% - 10%)
25Terms11High5MSM 623.50 (1.6%) 620.72 (1.4%) -0.4% ( -3% - 2%)
25Terms14High9MSM 2.99 (8.2%) 2.99 (4.7%) 0.1% ( -11% - 14%)
20Terms17High10MSM 2.60 (8.3%) 2.67 (4.9%) 3.1% ( -9% - 17%)
15Terms0High6MSM 70.68 (6.0%) 75.03 (3.6%) 6.2% ( -3% - 16%)
20Terms13High9MSM 1.02 (6.9%) 1.08 (5.6%) 6.2% ( -5% - 20%)
25Terms22High14MSM 1.02 (6.9%) 1.09 (5.7%) 6.8% ( -5% - 20%)
15Terms5High4MSM 4.38 (8.4%) 4.69 (5.1%) 7.1% ( -5% - 22%)
25Terms0High8MSM 42.57 (6.4%) 48.39 (3.1%) 13.7% ( 3% - 24%)
15Terms0High5MSM 64.99 (6.0%) 76.17 (4.0%) 17.2% ( 6% - 28%)
20Terms12High7MSM 1.59 (7.6%) 1.86 (5.5%) 17.3% ( 3% - 32%)
15Terms14High9MSM 0.78 (6.6%) 0.93 (6.9%) 20.2% ( 6% - 36%)
10Terms8High5MSM 5.14 (9.0%) 6.21 (5.9%) 20.9% ( 5% - 39%)
25Terms20High13MSM 0.58 (5.9%) 0.71 (7.5%) 21.6% ( 7% - 37%)
5Terms4High3MSM 8.61 (9.8%) 10.60 (6.1%) 23.2% ( 6% - 43%)
5Terms2High2MSM 11.50 (10.1%) 14.73 (6.2%) 28.1% ( 10% - 49%)
10Terms0High4MSM 36.94 (8.7%) 47.68 (4.8%) 29.1% ( 14% - 46%)
25Terms22High12MSM 0.82 (6.5%) 1.13 (7.1%) 38.3% ( 23% - 55%)
10Terms4High3MSM 3.38 (8.6%) 4.68 (6.6%) 38.7% ( 21% - 58%)
10Terms7High4MSM 1.25 (7.4%) 1.75 (6.6%) 40.5% ( 24% - 58%)
5Terms0High2MSM 60.10 (8.1%) 88.43 (5.2%) 47.1% ( 31% - 65%)
10Terms0High2MSM 47.57 (7.6%) 72.23 (5.1%) 51.9% ( 36% - 69%)
15Terms5High3MSM 2.52 (8.2%) 3.96 (7.4%) 56.8% ( 38% - 78%)
5Terms3High2MSM 4.25 (9.7%) 6.67 (7.5%) 57.0% ( 36% - 82%)
15Terms11High5MSM 2.63 (8.0%) 4.31 (7.7%) 63.7% ( 44% - 86%)
20Terms9High5MSM 0.87 (6.4%) 1.45 (8.7%) 67.7% ( 49% - 88%)
15Terms9High5MSM 0.82 (6.7%) 1.38 (8.1%) 68.6% ( 50% - 89%)
5Terms4High2MSM 5.49 (10.0%) 10.06 (8.9%) 83.4% ( 58% - 113%)
25Terms23High12MSM 0.35 (5.4%) 0.65 (11.6%) 84.2% ( 63% - 106%)
10Terms8High3MSM 1.29 (7.7%) 2.61 (9.6%) 102.5% ( 79% - 129%)
10Terms5High2MSM 5.07 (9.2%) 10.46 (9.8%) 106.3% ( 79% - 138%)
10Terms9High3MSM 1.12 (7.5%) 2.35 (9.8%) 110.8% ( 87% - 138%)
15Terms8High2MSM 2.56 (8.1%) 5.66 (10.0%) 121.5% ( 95% - 151%)
10Terms9High2MSM 2.07 (7.9%) 4.71 (10.4%) 127.5% ( 101% - 158%)
15Terms9High2MSM 0.98 (6.7%) 2.26 (10.1%) 131.5% ( 107% - 158%)
20Terms17High6MSM 0.31 (5.4%) 0.79 (14.8%) 151.9% ( 125% - 181%)
25Terms12High2MSM 0.68 (5.8%) 1.73 (11.4%) 154.3% ( 129% - 182%)
25Terms23High7MSM 0.40 (5.4%) 1.02 (13.3%) 155.3% ( 129% - 184%)
25Terms22High5MSM 0.81 (6.2%) 2.12 (12.1%) 159.8% ( 133% - 190%)
25Terms24High5MSM 0.32 (5.2%) 0.96 (15.7%) 203.4% ( 173% - 236%)
25Terms22High2MSM 0.31 (4.6%) 0.97 (15.1%) 215.5% ( 187% - 246%)
{noformat}
Each query draws from low and high freq terms, and eg 25Terms22High14MSM means there are 25 terms int he query, 22 of which are high freq, and minShouldMatch is 14.
Net/net the lower the minShouldMatch, especially vs the number of high-freq terms, the better BS1 is. But BS2 kicks butt otherwise!
> BooleanWeight should decide how to execute minNrShouldMatch
> -----------------------------------------------------------
>
> Key: LUCENE-4872
> URL: https://issues.apache.org/jira/browse/LUCENE-4872
> Project: Lucene - Core
> Issue Type: Sub-task
> Components: core/search
> Reporter: Robert Muir
> Fix For: 5.0, 4.3
>
> Attachments: crazyMinShouldMatch.tasks
>
>
> LUCENE-4571 adds a dedicated document-at-time scorer for minNrShouldMatch which can use advance() behind the scenes.
> In cases where you have some really common terms and some rare ones this can be a huge performance improvement.
> On the other hand BooleanScorer might still be faster in some cases.
> We should think about what the logic should be here: one simple thing to do is to always use the new scorer when minShouldMatch is set: thats where i'm leaning.
> But maybe we could have a smarter heuristic too, perhaps based on cost()
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org