You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Nikolay Khitrin (JIRA)" <ji...@apache.org> on 2018/07/27 11:13:00 UTC
[jira] [Updated] (LUCENE-8432) Stop calling comparator even if early termination is not possible

     [ https://issues.apache.org/jira/browse/LUCENE-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nikolay Khitrin updated LUCENE-8432:
------------------------------------
    Description: 
TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.

Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.

There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
       HighTermMonthSort      226.04      (6.3%)      215.33      (4.3%)   -4.7% ( -14% -    6%)
                 LowTerm      933.27      (5.5%)      924.62      (4.2%)   -0.9% ( -10% -    9%)
            OrNotHighLow      945.68      (5.7%)      939.12      (4.5%)   -0.7% ( -10% -   10%)
             MedSpanNear       28.76      (1.4%)       28.61      (1.5%)   -0.5% (  -3% -    2%)
BrowseDayOfYearSSDVFacets       16.36      (5.0%)       16.29      (4.5%)   -0.4% (  -9% -    9%)
              AndHighMed      112.30      (2.9%)      111.96      (1.6%)   -0.3% (  -4% -    4%)
             LowSpanNear       12.42      (1.5%)       12.38      (1.6%)   -0.3% (  -3% -    2%)
        HighSloppyPhrase       18.66      (3.9%)       18.62      (4.0%)   -0.2% (  -7% -    7%)
               MedPhrase      219.40      (2.7%)      219.06      (2.7%)   -0.2% (  -5% -    5%)
            OrNotHighMed      222.88      (3.2%)      222.63      (3.4%)   -0.1% (  -6% -    6%)
              AndHighLow      521.59      (3.5%)      521.02      (4.5%)   -0.1% (  -7% -    8%)
         MedSloppyPhrase       16.71      (4.7%)       16.70      (4.7%)   -0.0% (  -8% -    9%)
               LowPhrase       15.58      (2.5%)       15.59      (2.9%)    0.0% (  -5% -    5%)
                 Respell       92.05      (2.4%)       92.19      (3.0%)    0.2% (  -5% -    5%)
            HighSpanNear       17.03      (2.2%)       17.06      (2.1%)    0.2% (  -4% -    4%)
              HighPhrase       37.85      (5.8%)       37.92      (5.9%)    0.2% ( -10% -   12%)
            OrHighNotLow      118.25      (2.9%)      118.47      (3.5%)    0.2% (  -6% -    6%)
   BrowseMonthTaxoFacets        2.94      (0.4%)        2.94      (0.8%)    0.2% (   0% -    1%)
    BrowseDateTaxoFacets        2.75      (0.3%)        2.75      (1.6%)    0.3% (  -1% -    2%)
         LowSloppyPhrase      105.28      (2.3%)      105.60      (2.5%)    0.3% (  -4% -    5%)
                 Prefix3      122.07      (6.8%)      122.55      (6.5%)    0.4% ( -12% -   14%)
           OrNotHighHigh       55.07      (3.8%)       55.29      (4.5%)    0.4% (  -7% -    8%)
   BrowseMonthSSDVFacets       20.88      (7.2%)       20.99      (7.5%)    0.5% ( -13% -   16%)
           OrHighNotHigh       58.40      (4.2%)       58.72      (4.8%)    0.6% (  -8% -    9%)
                Wildcard       79.87      (3.7%)       80.31      (4.0%)    0.6% (  -6% -    8%)
               OrHighMed       13.25      (4.3%)       13.34      (4.9%)    0.6% (  -8% -   10%)
BrowseDayOfYearTaxoFacets        2.73      (0.6%)        2.75      (1.6%)    0.7% (  -1% -    2%)
              OrHighHigh       22.03      (4.1%)       22.19      (4.9%)    0.7% (  -8% -   10%)
             AndHighHigh       23.46      (2.1%)       23.63      (1.9%)    0.7% (  -3% -    4%)
                PKLookup      145.59      (4.2%)      146.66      (4.3%)    0.7% (  -7% -    9%)
                 MedTerm      171.13      (5.0%)      172.43      (5.1%)    0.8% (  -8% -   11%)
               OrHighLow      119.22      (2.8%)      120.23      (3.1%)    0.8% (  -4% -    6%)
            OrHighNotMed       87.06      (3.7%)       87.80      (4.1%)    0.8% (  -6% -    8%)
                  IntNRQ       26.44     (12.8%)       26.68     (11.5%)    0.9% ( -20% -   28%)
                HighTerm      107.64      (6.1%)      108.88      (5.6%)    1.2% (  -9% -   13%)
                  Fuzzy2       69.69     (10.7%)       71.64      (7.4%)    2.8% ( -13% -   23%)
                  Fuzzy1       53.95      (6.5%)       55.79      (6.2%)    3.4% (  -8% -   17%)
   HighTermDayOfYearSort       19.71      (4.7%)       21.51      (7.1%)    9.1% (  -2% -   21%){noformat}
Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.

  was:
TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.

Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.

There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff

       HighTermMonthSort      226.04      (6.3%)      215.33      (4.3%)   -4.7% ( -14% -    6%)

                 LowTerm      933.27      (5.5%)      924.62      (4.2%)   -0.9% ( -10% -    9%)

            OrNotHighLow      945.68      (5.7%)      939.12      (4.5%)   -0.7% ( -10% -   10%)

             MedSpanNear       28.76      (1.4%)       28.61      (1.5%)   -0.5% (  -3% -    2%)

BrowseDayOfYearSSDVFacets       16.36      (5.0%)       16.29      (4.5%)   -0.4% (  -9% -    9%)

              AndHighMed      112.30      (2.9%)      111.96      (1.6%)   -0.3% (  -4% -    4%)

             LowSpanNear       12.42      (1.5%)       12.38      (1.6%)   -0.3% (  -3% -    2%)

        HighSloppyPhrase       18.66      (3.9%)       18.62      (4.0%)   -0.2% (  -7% -    7%)

               MedPhrase      219.40      (2.7%)      219.06      (2.7%)   -0.2% (  -5% -    5%)

            OrNotHighMed      222.88      (3.2%)      222.63      (3.4%)   -0.1% (  -6% -    6%)

              AndHighLow      521.59      (3.5%)      521.02      (4.5%)   -0.1% (  -7% -    8%)

         MedSloppyPhrase       16.71      (4.7%)       16.70      (4.7%)   -0.0% (  -8% -    9%)

               LowPhrase       15.58      (2.5%)       15.59      (2.9%)    0.0% (  -5% -    5%)

                 Respell       92.05      (2.4%)       92.19      (3.0%)    0.2% (  -5% -    5%)

            HighSpanNear       17.03      (2.2%)       17.06      (2.1%)    0.2% (  -4% -    4%)

              HighPhrase       37.85      (5.8%)       37.92      (5.9%)    0.2% ( -10% -   12%)

            OrHighNotLow      118.25      (2.9%)      118.47      (3.5%)    0.2% (  -6% -    6%)

   BrowseMonthTaxoFacets        2.94      (0.4%)        2.94      (0.8%)    0.2% (   0% -    1%)

    BrowseDateTaxoFacets        2.75      (0.3%)        2.75      (1.6%)    0.3% (  -1% -    2%)

         LowSloppyPhrase      105.28      (2.3%)      105.60      (2.5%)    0.3% (  -4% -    5%)

                 Prefix3      122.07      (6.8%)      122.55      (6.5%)    0.4% ( -12% -   14%)

           OrNotHighHigh       55.07      (3.8%)       55.29      (4.5%)    0.4% (  -7% -    8%)

   BrowseMonthSSDVFacets       20.88      (7.2%)       20.99      (7.5%)    0.5% ( -13% -   16%)

           OrHighNotHigh       58.40      (4.2%)       58.72      (4.8%)    0.6% (  -8% -    9%)

                Wildcard       79.87      (3.7%)       80.31      (4.0%)    0.6% (  -6% -    8%)

               OrHighMed       13.25      (4.3%)       13.34      (4.9%)    0.6% (  -8% -   10%)

BrowseDayOfYearTaxoFacets        2.73      (0.6%)        2.75      (1.6%)    0.7% (  -1% -    2%)

              OrHighHigh       22.03      (4.1%)       22.19      (4.9%)    0.7% (  -8% -   10%)

             AndHighHigh       23.46      (2.1%)       23.63      (1.9%)    0.7% (  -3% -    4%)

                PKLookup      145.59      (4.2%)      146.66      (4.3%)    0.7% (  -7% -    9%)

                 MedTerm      171.13      (5.0%)      172.43      (5.1%)    0.8% (  -8% -   11%)

               OrHighLow      119.22      (2.8%)      120.23      (3.1%)    0.8% (  -4% -    6%)

            OrHighNotMed       87.06      (3.7%)       87.80      (4.1%)    0.8% (  -6% -    8%)

                  IntNRQ       26.44     (12.8%)       26.68     (11.5%)    0.9% ( -20% -   28%)

                HighTerm      107.64      (6.1%)      108.88      (5.6%)    1.2% (  -9% -   13%)

                  Fuzzy2       69.69     (10.7%)       71.64      (7.4%)    2.8% ( -13% -   23%)

                  Fuzzy1       53.95      (6.5%)       55.79      (6.2%)    3.4% (  -8% -   17%)

   HighTermDayOfYearSort       19.71      (4.7%)       21.51      (7.1%)    9.1% (  -2% -   21%){noformat}
Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.


> Stop calling comparator even if early termination is not possible
> -----------------------------------------------------------------
>
>                 Key: LUCENE-8432
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8432
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 7.3
>            Reporter: Nikolay Khitrin
>            Priority: Minor
>         Attachments: LUCENE-8432.patch
>
>
> TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.
> Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.
> There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
> {noformat}
>                     TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
>        HighTermMonthSort      226.04      (6.3%)      215.33      (4.3%)   -4.7% ( -14% -    6%)
>                  LowTerm      933.27      (5.5%)      924.62      (4.2%)   -0.9% ( -10% -    9%)
>             OrNotHighLow      945.68      (5.7%)      939.12      (4.5%)   -0.7% ( -10% -   10%)
>              MedSpanNear       28.76      (1.4%)       28.61      (1.5%)   -0.5% (  -3% -    2%)
> BrowseDayOfYearSSDVFacets       16.36      (5.0%)       16.29      (4.5%)   -0.4% (  -9% -    9%)
>               AndHighMed      112.30      (2.9%)      111.96      (1.6%)   -0.3% (  -4% -    4%)
>              LowSpanNear       12.42      (1.5%)       12.38      (1.6%)   -0.3% (  -3% -    2%)
>         HighSloppyPhrase       18.66      (3.9%)       18.62      (4.0%)   -0.2% (  -7% -    7%)
>                MedPhrase      219.40      (2.7%)      219.06      (2.7%)   -0.2% (  -5% -    5%)
>             OrNotHighMed      222.88      (3.2%)      222.63      (3.4%)   -0.1% (  -6% -    6%)
>               AndHighLow      521.59      (3.5%)      521.02      (4.5%)   -0.1% (  -7% -    8%)
>          MedSloppyPhrase       16.71      (4.7%)       16.70      (4.7%)   -0.0% (  -8% -    9%)
>                LowPhrase       15.58      (2.5%)       15.59      (2.9%)    0.0% (  -5% -    5%)
>                  Respell       92.05      (2.4%)       92.19      (3.0%)    0.2% (  -5% -    5%)
>             HighSpanNear       17.03      (2.2%)       17.06      (2.1%)    0.2% (  -4% -    4%)
>               HighPhrase       37.85      (5.8%)       37.92      (5.9%)    0.2% ( -10% -   12%)
>             OrHighNotLow      118.25      (2.9%)      118.47      (3.5%)    0.2% (  -6% -    6%)
>    BrowseMonthTaxoFacets        2.94      (0.4%)        2.94      (0.8%)    0.2% (   0% -    1%)
>     BrowseDateTaxoFacets        2.75      (0.3%)        2.75      (1.6%)    0.3% (  -1% -    2%)
>          LowSloppyPhrase      105.28      (2.3%)      105.60      (2.5%)    0.3% (  -4% -    5%)
>                  Prefix3      122.07      (6.8%)      122.55      (6.5%)    0.4% ( -12% -   14%)
>            OrNotHighHigh       55.07      (3.8%)       55.29      (4.5%)    0.4% (  -7% -    8%)
>    BrowseMonthSSDVFacets       20.88      (7.2%)       20.99      (7.5%)    0.5% ( -13% -   16%)
>            OrHighNotHigh       58.40      (4.2%)       58.72      (4.8%)    0.6% (  -8% -    9%)
>                 Wildcard       79.87      (3.7%)       80.31      (4.0%)    0.6% (  -6% -    8%)
>                OrHighMed       13.25      (4.3%)       13.34      (4.9%)    0.6% (  -8% -   10%)
> BrowseDayOfYearTaxoFacets        2.73      (0.6%)        2.75      (1.6%)    0.7% (  -1% -    2%)
>               OrHighHigh       22.03      (4.1%)       22.19      (4.9%)    0.7% (  -8% -   10%)
>              AndHighHigh       23.46      (2.1%)       23.63      (1.9%)    0.7% (  -3% -    4%)
>                 PKLookup      145.59      (4.2%)      146.66      (4.3%)    0.7% (  -7% -    9%)
>                  MedTerm      171.13      (5.0%)      172.43      (5.1%)    0.8% (  -8% -   11%)
>                OrHighLow      119.22      (2.8%)      120.23      (3.1%)    0.8% (  -4% -    6%)
>             OrHighNotMed       87.06      (3.7%)       87.80      (4.1%)    0.8% (  -6% -    8%)
>                   IntNRQ       26.44     (12.8%)       26.68     (11.5%)    0.9% ( -20% -   28%)
>                 HighTerm      107.64      (6.1%)      108.88      (5.6%)    1.2% (  -9% -   13%)
>                   Fuzzy2       69.69     (10.7%)       71.64      (7.4%)    2.8% ( -13% -   23%)
>                   Fuzzy1       53.95      (6.5%)       55.79      (6.2%)    3.4% (  -8% -   17%)
>    HighTermDayOfYearSort       19.71      (4.7%)       21.51      (7.1%)    9.1% (  -2% -   21%){noformat}
> Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org