You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Da Huang (JIRA)" <ji...@apache.org> on 2014/08/02 14:11:12 UTC
[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be
used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Da Huang updated LUCENE-4396:
-----------------------------
Attachment: LUCENE-4396.patch
This is a patch based on git mirror commit 67d17eb81b754fa242bb91e1b91070fd8b38ecd9
In this patch, I go further based on the last patch.
Firstly, I move all scorer choosing logics to .bulkScorer(), so that there's no need to wrap scorer in .bulkScorer().
Secondly, I have tried to use BooleanScorer for some cases with MUST.
However, it seems that there's something wrong with my test on BS before.
The perf. of BS can just beat DAAT on 2 cases, and BS perfs worse than other explored scorers on these 2 cases.
Ther perf of BQ(the merged scorer) and BS is showed as follows.
{code}
BQ
TaskQPS baseline StdDevQPS my_version StdDev Pct diff
HighAndTonsLowNot 5.01 (3.5%) 4.29 (2.7%) -14.3% ( -19% - -8%)
HighAndSomeLowNot 15.33 (5.1%) 13.71 (5.4%) -10.6% ( -20% - 0%)
LowAndSomeLowOr 240.72 (2.5%) 217.73 (2.5%) -9.6% ( -14% - -4%)
LowAndSomeLowNot 269.51 (1.4%) 244.76 (2.3%) -9.2% ( -12% - -5%)
HighAndTonsLowOr 5.19 (5.3%) 4.94 (2.0%) -4.8% ( -11% - 2%)
HighAndSomeHighNot 1.60 (2.0%) 1.57 (2.6%) -1.9% ( -6% - 2%)
HighAndSomeLowOr 6.65 (11.5%) 6.77 (4.1%) 1.8% ( -12% - 19%)
PKLookup 96.93 (2.3%) 99.72 (4.1%) 2.9% ( -3% - 9%)
LowAndSomeHighNot 59.45 (1.5%) 61.63 (2.4%) 3.7% ( 0% - 7%)
LowAndSomeHighOr 40.78 (2.0%) 42.75 (3.0%) 4.8% ( 0% - 10%)
HighAndSomeHighOr 2.11 (2.8%) 2.44 (3.0%) 16.1% ( 10% - 22%)
LowAndTonsLowNot 17.45 (1.3%) 20.88 (2.5%) 19.6% ( 15% - 23%)
LowAndTonsHighOr 2.76 (1.6%) 3.34 (3.1%) 21.0% ( 16% - 26%)
LowAndTonsLowOr 15.36 (1.2%) 19.83 (3.1%) 29.2% ( 24% - 33%)
HighAndTonsHighOr 0.08 (0.7%) 0.21 (5.1%) 159.8% ( 152% - 166%)
LowAndTonsHighNot 1.69 (1.5%) 5.14 (5.9%) 204.0% ( 193% - 214%)
HighAndTonsHighNot 0.09 (0.7%) 0.41 (11.0%) 359.9% ( 345% - 374%)
BooleanScorer
TaskQPS baseline StdDevQPS my_version StdDev Pct diff
LowAndSomeHighOr 51.38 (1.7%) 1.47 (0.4%) -97.1% ( -97% - -96%)
LowAndTonsHighOr 2.79 (1.5%) 0.10 (0.5%) -96.5% ( -97% - -95%)
LowAndTonsHighNot 1.71 (2.0%) 0.17 (0.7%) -90.3% ( -91% - -89%)
LowAndSomeHighNot 32.69 (2.2%) 3.18 (0.6%) -90.3% ( -91% - -89%)
LowAndSomeLowOr 258.50 (1.7%) 91.84 (1.6%) -64.5% ( -66% - -62%)
HighAndSomeLowOr 12.66 (9.1%) 5.89 (2.3%) -53.5% ( -59% - -46%)
LowAndSomeLowNot 252.33 (2.1%) 124.57 (1.1%) -50.6% ( -52% - -48%)
HighAndTonsLowOr 3.13 (7.5%) 1.57 (2.3%) -49.7% ( -55% - -43%)
LowAndTonsLowOr 14.17 (0.8%) 7.32 (2.6%) -48.4% ( -51% - -45%)
HighAndSomeLowNot 18.01 (5.6%) 10.03 (2.8%) -44.3% ( -49% - -37%)
LowAndTonsLowNot 17.17 (1.1%) 11.33 (1.5%) -34.0% ( -36% - -31%)
HighAndTonsLowNot 6.29 (2.5%) 4.73 (2.4%) -24.9% ( -29% - -20%)
HighAndSomeHighOr 1.66 (3.1%) 1.28 (7.5%) -22.7% ( -32% - -12%)
HighAndSomeHighNot 2.11 (1.4%) 1.83 (3.4%) -13.5% ( -18% - -8%)
PKLookup 96.92 (4.0%) 94.94 (2.5%) -2.0% ( -8% - 4%)
HighAndTonsHighOr 0.07 (0.5%) 0.09 (18.2%) 38.3% ( 19% - 57%)
HighAndTonsHighNot 0.04 (1.9%) 0.16 (24.4%) 263.0% ( 232% - 294%)
{code}
By the perf. table of BQ, it looks that BQ perfs low on the first 4 cases.
However, when I run these cases one by one, they're just worse than the trunk within 2%.
I'm not sure what makes this happen?
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, luceneutil-score-equal.patch, stat.cpp, stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared to the other clauses, that BooleanScorer would perform better than BooleanScorer2. BooleanScorer still has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this capability ... I think the challenging part might be the heuristics on when to use which (likely we would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs in this case, eg if suddenly the MUST clause skips 1000000 docs then you want to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org