You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Da Huang (JIRA)" <ji...@apache.org> on 2014/08/02 14:11:12 UTC
[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

     [ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Da Huang updated LUCENE-4396:
-----------------------------

    Attachment: LUCENE-4396.patch

This is a patch based on git mirror commit 67d17eb81b754fa242bb91e1b91070fd8b38ecd9

In this patch, I go further based on the last patch.

Firstly, I move all scorer choosing logics to .bulkScorer(), so that there's no need to wrap scorer in .bulkScorer().

Secondly, I have tried to use BooleanScorer for some cases with MUST.
However, it seems that there's something wrong with my test on BS before.
The perf. of BS can just beat DAAT on 2 cases, and BS perfs worse than other explored scorers on these 2 cases.

Ther perf of BQ(the merged scorer) and BS is showed as follows.

{code}
BQ
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
       HighAndTonsLowNot        5.01      (3.5%)        4.29      (2.7%)  -14.3% ( -19% -   -8%)
       HighAndSomeLowNot       15.33      (5.1%)       13.71      (5.4%)  -10.6% ( -20% -    0%)
         LowAndSomeLowOr      240.72      (2.5%)      217.73      (2.5%)   -9.6% ( -14% -   -4%)
        LowAndSomeLowNot      269.51      (1.4%)      244.76      (2.3%)   -9.2% ( -12% -   -5%)
        HighAndTonsLowOr        5.19      (5.3%)        4.94      (2.0%)   -4.8% ( -11% -    2%)
      HighAndSomeHighNot        1.60      (2.0%)        1.57      (2.6%)   -1.9% (  -6% -    2%)
        HighAndSomeLowOr        6.65     (11.5%)        6.77      (4.1%)    1.8% ( -12% -   19%)
                PKLookup       96.93      (2.3%)       99.72      (4.1%)    2.9% (  -3% -    9%)
       LowAndSomeHighNot       59.45      (1.5%)       61.63      (2.4%)    3.7% (   0% -    7%)
        LowAndSomeHighOr       40.78      (2.0%)       42.75      (3.0%)    4.8% (   0% -   10%)
       HighAndSomeHighOr        2.11      (2.8%)        2.44      (3.0%)   16.1% (  10% -   22%)
        LowAndTonsLowNot       17.45      (1.3%)       20.88      (2.5%)   19.6% (  15% -   23%)
        LowAndTonsHighOr        2.76      (1.6%)        3.34      (3.1%)   21.0% (  16% -   26%)
         LowAndTonsLowOr       15.36      (1.2%)       19.83      (3.1%)   29.2% (  24% -   33%)
       HighAndTonsHighOr        0.08      (0.7%)        0.21      (5.1%)  159.8% ( 152% -  166%)
       LowAndTonsHighNot        1.69      (1.5%)        5.14      (5.9%)  204.0% ( 193% -  214%)
      HighAndTonsHighNot        0.09      (0.7%)        0.41     (11.0%)  359.9% ( 345% -  374%)


BooleanScorer
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
        LowAndSomeHighOr       51.38      (1.7%)        1.47      (0.4%)  -97.1% ( -97% -  -96%)
        LowAndTonsHighOr        2.79      (1.5%)        0.10      (0.5%)  -96.5% ( -97% -  -95%)
       LowAndTonsHighNot        1.71      (2.0%)        0.17      (0.7%)  -90.3% ( -91% -  -89%)
       LowAndSomeHighNot       32.69      (2.2%)        3.18      (0.6%)  -90.3% ( -91% -  -89%)
         LowAndSomeLowOr      258.50      (1.7%)       91.84      (1.6%)  -64.5% ( -66% -  -62%)
        HighAndSomeLowOr       12.66      (9.1%)        5.89      (2.3%)  -53.5% ( -59% -  -46%)
        LowAndSomeLowNot      252.33      (2.1%)      124.57      (1.1%)  -50.6% ( -52% -  -48%)
        HighAndTonsLowOr        3.13      (7.5%)        1.57      (2.3%)  -49.7% ( -55% -  -43%)
         LowAndTonsLowOr       14.17      (0.8%)        7.32      (2.6%)  -48.4% ( -51% -  -45%)
       HighAndSomeLowNot       18.01      (5.6%)       10.03      (2.8%)  -44.3% ( -49% -  -37%)
        LowAndTonsLowNot       17.17      (1.1%)       11.33      (1.5%)  -34.0% ( -36% -  -31%)
       HighAndTonsLowNot        6.29      (2.5%)        4.73      (2.4%)  -24.9% ( -29% -  -20%)
       HighAndSomeHighOr        1.66      (3.1%)        1.28      (7.5%)  -22.7% ( -32% -  -12%)
      HighAndSomeHighNot        2.11      (1.4%)        1.83      (3.4%)  -13.5% ( -18% -   -8%)
                PKLookup       96.92      (4.0%)       94.94      (2.5%)   -2.0% (  -8% -    4%)
       HighAndTonsHighOr        0.07      (0.5%)        0.09     (18.2%)   38.3% (  19% -   57%)
      HighAndTonsHighNot        0.04      (1.9%)        0.16     (24.4%)  263.0% ( 232% -  294%)

{code}

By the perf. table of BQ, it looks that BQ perfs low on the first 4 cases.
However, when I run these cases one by one, they're just worse than the trunk within 2%.
I'm not sure what makes this happen?

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, luceneutil-score-equal.patch, stat.cpp, stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared to the other clauses, that BooleanScorer would perform better than BooleanScorer2.  BooleanScorer still has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this capability ... I think the challenging part might be the heuristics on when to use which (likely we would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs in this case, eg if suddenly the MUST clause skips 1000000 docs then you want to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org