You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jim Ferenczi (JIRA)" <ji...@apache.org> on 2018/07/27 13:02:00 UTC
[jira] [Commented] (LUCENE-8432) Stop calling comparator even if early termination is not possible

    [ https://issues.apache.org/jira/browse/LUCENE-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559721#comment-16559721 ] 

Jim Ferenczi commented on LUCENE-8432:
--------------------------------------

Thanks [~khitrin], the change makes sense to me. The other way to achieve this optimization is to use a MultiCollector that wraps a TotalHitCountCollector and a TopFieldCollector but I prefer the solution that you propose. It's much simpler if this can be done automatically by the top field collector. Any objections [~jpountz] ?

> Stop calling comparator even if early termination is not possible
> -----------------------------------------------------------------
>
>                 Key: LUCENE-8432
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8432
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 7.3
>            Reporter: Nikolay Khitrin
>            Priority: Minor
>         Attachments: LUCENE-8432.patch
>
>
> TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.
> Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.
> There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
> {noformat}
>                     TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
>        HighTermMonthSort      226.04      (6.3%)      215.33      (4.3%)   -4.7% ( -14% -    6%)
>                  LowTerm      933.27      (5.5%)      924.62      (4.2%)   -0.9% ( -10% -    9%)
>             OrNotHighLow      945.68      (5.7%)      939.12      (4.5%)   -0.7% ( -10% -   10%)
>              MedSpanNear       28.76      (1.4%)       28.61      (1.5%)   -0.5% (  -3% -    2%)
> BrowseDayOfYearSSDVFacets       16.36      (5.0%)       16.29      (4.5%)   -0.4% (  -9% -    9%)
>               AndHighMed      112.30      (2.9%)      111.96      (1.6%)   -0.3% (  -4% -    4%)
>              LowSpanNear       12.42      (1.5%)       12.38      (1.6%)   -0.3% (  -3% -    2%)
>         HighSloppyPhrase       18.66      (3.9%)       18.62      (4.0%)   -0.2% (  -7% -    7%)
>                MedPhrase      219.40      (2.7%)      219.06      (2.7%)   -0.2% (  -5% -    5%)
>             OrNotHighMed      222.88      (3.2%)      222.63      (3.4%)   -0.1% (  -6% -    6%)
>               AndHighLow      521.59      (3.5%)      521.02      (4.5%)   -0.1% (  -7% -    8%)
>          MedSloppyPhrase       16.71      (4.7%)       16.70      (4.7%)   -0.0% (  -8% -    9%)
>                LowPhrase       15.58      (2.5%)       15.59      (2.9%)    0.0% (  -5% -    5%)
>                  Respell       92.05      (2.4%)       92.19      (3.0%)    0.2% (  -5% -    5%)
>             HighSpanNear       17.03      (2.2%)       17.06      (2.1%)    0.2% (  -4% -    4%)
>               HighPhrase       37.85      (5.8%)       37.92      (5.9%)    0.2% ( -10% -   12%)
>             OrHighNotLow      118.25      (2.9%)      118.47      (3.5%)    0.2% (  -6% -    6%)
>    BrowseMonthTaxoFacets        2.94      (0.4%)        2.94      (0.8%)    0.2% (   0% -    1%)
>     BrowseDateTaxoFacets        2.75      (0.3%)        2.75      (1.6%)    0.3% (  -1% -    2%)
>          LowSloppyPhrase      105.28      (2.3%)      105.60      (2.5%)    0.3% (  -4% -    5%)
>                  Prefix3      122.07      (6.8%)      122.55      (6.5%)    0.4% ( -12% -   14%)
>            OrNotHighHigh       55.07      (3.8%)       55.29      (4.5%)    0.4% (  -7% -    8%)
>    BrowseMonthSSDVFacets       20.88      (7.2%)       20.99      (7.5%)    0.5% ( -13% -   16%)
>            OrHighNotHigh       58.40      (4.2%)       58.72      (4.8%)    0.6% (  -8% -    9%)
>                 Wildcard       79.87      (3.7%)       80.31      (4.0%)    0.6% (  -6% -    8%)
>                OrHighMed       13.25      (4.3%)       13.34      (4.9%)    0.6% (  -8% -   10%)
> BrowseDayOfYearTaxoFacets        2.73      (0.6%)        2.75      (1.6%)    0.7% (  -1% -    2%)
>               OrHighHigh       22.03      (4.1%)       22.19      (4.9%)    0.7% (  -8% -   10%)
>              AndHighHigh       23.46      (2.1%)       23.63      (1.9%)    0.7% (  -3% -    4%)
>                 PKLookup      145.59      (4.2%)      146.66      (4.3%)    0.7% (  -7% -    9%)
>                  MedTerm      171.13      (5.0%)      172.43      (5.1%)    0.8% (  -8% -   11%)
>                OrHighLow      119.22      (2.8%)      120.23      (3.1%)    0.8% (  -4% -    6%)
>             OrHighNotMed       87.06      (3.7%)       87.80      (4.1%)    0.8% (  -6% -    8%)
>                   IntNRQ       26.44     (12.8%)       26.68     (11.5%)    0.9% ( -20% -   28%)
>                 HighTerm      107.64      (6.1%)      108.88      (5.6%)    1.2% (  -9% -   13%)
>                   Fuzzy2       69.69     (10.7%)       71.64      (7.4%)    2.8% ( -13% -   23%)
>                   Fuzzy1       53.95      (6.5%)       55.79      (6.2%)    3.4% (  -8% -   17%)
>    HighTermDayOfYearSort       19.71      (4.7%)       21.51      (7.1%)    9.1% (  -2% -   21%){noformat}
> Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org