You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/10 03:37:45 UTC

[GitHub] [lucene] zacharymorn commented on pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

zacharymorn commented on pull request #113:
URL: https://github.com/apache/lucene/pull/113#issuecomment-836122884


   Hi @jpountz, I've ported your changes to this BulkScorer implementation as well, and run both 5 OrMed as well as full wikimedium5m benchmark:
   
   ```
   OrMedMedMedMedMed run 1
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrMedMedMedMedMed       40.90      (8.5%)       39.37      (6.8%)   -3.7% ( -17% -   12%) 0.126
                   PKLookup      228.21      (1.9%)      223.87      (2.2%)   -1.9% (  -5% -    2%) 0.004
   ```
   ```
   OrMedMedMedMedMed run 2
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrMedMedMedMedMed       39.72      (5.0%)       38.01      (7.4%)   -4.3% ( -15% -    8%) 0.030
                   PKLookup      226.45      (2.1%)      223.28      (2.3%)   -1.4% (  -5% -    3%) 0.048
   ```
   ```
   OrMedMedMedMedMed run 3
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   PKLookup      226.41      (3.3%)      222.43      (2.3%)   -1.8% (  -7% -    3%) 0.052
          OrMedMedMedMedMed       38.83      (6.7%)       39.27      (7.1%)    1.1% ( -11% -   15%) 0.600
   ```
   ```
   full wikimedium5m run 1
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   Wildcard      376.63      (5.8%)      360.47      (6.2%)   -4.3% ( -15% -    8%) 0.024
              OrNotHighHigh      745.74      (4.5%)      730.51      (5.7%)   -2.0% ( -11% -    8%) 0.208
                     Fuzzy2       40.89      (6.0%)       40.20      (8.5%)   -1.7% ( -15% -   13%) 0.465
      HighTermDayOfYearSort      354.09     (16.6%)      348.53     (13.2%)   -1.6% ( -26% -   33%) 0.740
      BrowseMonthSSDVFacets       31.93      (3.0%)       31.50      (6.5%)   -1.3% ( -10% -    8%) 0.402
                    LowTerm     1978.09      (5.1%)     1956.82      (5.3%)   -1.1% ( -10% -    9%) 0.514
                     IntNRQ      194.54      (3.6%)      193.05      (4.2%)   -0.8% (  -8% -    7%) 0.537
          HighTermMonthSort      330.71     (10.6%)      328.18      (9.7%)   -0.8% ( -19% -   21%) 0.812
               OrHighNotLow      806.97      (6.4%)      801.14      (5.6%)   -0.7% ( -11% -   11%) 0.702
   BrowseDayOfYearSSDVFacets       28.57      (1.7%)       28.39      (2.0%)   -0.6% (  -4% -    3%) 0.294
                AndHighHigh       70.54      (3.8%)       70.12      (4.6%)   -0.6% (  -8% -    8%) 0.657
                    Respell       78.30      (2.0%)       77.93      (2.1%)   -0.5% (  -4% -    3%) 0.463
              OrHighNotHigh      772.33      (5.0%)      768.86      (5.8%)   -0.4% ( -10% -   10%) 0.795
                    Prefix3      133.26      (7.3%)      132.68      (8.8%)   -0.4% ( -15% -   16%) 0.865
       HighTermTitleBDVSort      189.02     (17.9%)      188.23     (12.7%)   -0.4% ( -26% -   36%) 0.932
                MedSpanNear      129.28      (2.6%)      129.09      (3.1%)   -0.1% (  -5% -    5%) 0.871
               OrNotHighLow      900.87      (3.4%)      900.01      (3.7%)   -0.1% (  -6% -    7%) 0.932
                  LowPhrase       61.05      (2.7%)       61.00      (3.1%)   -0.1% (  -5% -    5%) 0.918
               HighSpanNear       96.65      (3.2%)       96.63      (3.3%)   -0.0% (  -6% -    6%) 0.990
                     Fuzzy1       67.13      (6.9%)       67.15      (6.6%)    0.0% ( -12% -   14%) 0.988
               OrHighNotMed      811.67      (4.9%)      812.18      (5.6%)    0.1% (  -9% -   11%) 0.969
      BrowseMonthTaxoFacets       13.21      (2.8%)       13.22      (2.8%)    0.1% (  -5% -    5%) 0.941
                 HighPhrase       34.18      (3.1%)       34.21      (3.3%)    0.1% (  -6% -    6%) 0.939
                 AndHighLow      905.10      (4.0%)      905.96      (5.0%)    0.1% (  -8% -    9%) 0.947
                  MedPhrase       87.90      (2.8%)       88.10      (3.0%)    0.2% (  -5% -    6%) 0.811
       BrowseDateTaxoFacets       11.06      (3.9%)       11.09      (3.4%)    0.3% (  -6% -    7%) 0.811
   BrowseDayOfYearTaxoFacets       11.05      (3.8%)       11.08      (3.4%)    0.3% (  -6% -    7%) 0.801
            MedSloppyPhrase      152.46      (3.1%)      152.89      (2.7%)    0.3% (  -5% -    6%) 0.757
                   PKLookup      215.89      (2.8%)      216.86      (3.8%)    0.5% (  -5% -    7%) 0.667
                 TermDTSort      436.33     (15.6%)      438.31     (13.8%)    0.5% ( -25% -   35%) 0.922
                LowSpanNear      119.90      (2.4%)      120.46      (2.3%)    0.5% (  -4% -    5%) 0.533
           HighSloppyPhrase       28.82      (3.9%)       28.99      (2.8%)    0.6% (  -5% -    7%) 0.586
                 AndHighMed      475.36      (5.6%)      478.26      (5.8%)    0.6% ( -10% -   12%) 0.735
            LowSloppyPhrase      388.99      (3.4%)      392.32      (2.9%)    0.9% (  -5% -    7%) 0.387
               OrNotHighMed      774.61      (6.6%)      781.75      (5.6%)    0.9% ( -10% -   14%) 0.633
                   HighTerm     1268.49      (5.6%)     1290.00      (5.6%)    1.7% (  -9% -   13%) 0.340
       HighIntervalsOrdered      417.04      (3.1%)      425.09      (2.9%)    1.9% (  -3% -    8%) 0.043
                    MedTerm     1583.25      (5.4%)     1627.50      (5.5%)    2.8% (  -7% -   14%) 0.107
                 OrHighHigh       61.28      (3.6%)       64.46      (3.0%)    5.2% (  -1% -   12%) 0.000
                  OrHighMed       79.13      (2.9%)       85.68      (3.3%)    8.3% (   1% -   14%) 0.000
                  OrHighLow      231.58      (4.7%)      683.73     (16.0%)  195.2% ( 166% -  226%) 0.000
   ```
   ```
   full wikimedium5m run 2
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 OrHighHigh       97.84      (2.7%)       78.42      (2.1%)  -19.8% ( -24% -  -15%) 0.000
       HighTermTitleBDVSort      223.86     (17.8%)      217.70     (16.4%)   -2.8% ( -31% -   38%) 0.611
               OrNotHighLow      964.32      (2.6%)      945.18      (6.0%)   -2.0% ( -10% -    6%) 0.175
               OrHighNotLow      814.26      (5.8%)      799.46      (5.7%)   -1.8% ( -12% -   10%) 0.316
          HighTermMonthSort      342.78     (14.3%)      338.52     (15.6%)   -1.2% ( -27% -   33%) 0.793
      HighTermDayOfYearSort      259.90     (13.7%)      257.22     (13.8%)   -1.0% ( -25% -   30%) 0.812
                 TermDTSort      234.69     (10.9%)      232.30     (12.3%)   -1.0% ( -21% -   24%) 0.782
                AndHighHigh       93.13      (3.0%)       92.19      (3.5%)   -1.0% (  -7% -    5%) 0.326
                    MedTerm     1410.12      (3.9%)     1398.22      (2.4%)   -0.8% (  -6% -    5%) 0.408
              OrNotHighHigh      679.95      (6.4%)      674.81      (6.3%)   -0.8% ( -12% -   12%) 0.706
               OrHighNotMed      744.68      (4.4%)      739.05      (5.8%)   -0.8% ( -10% -    9%) 0.644
                 AndHighMed      451.76      (3.8%)      448.59      (3.4%)   -0.7% (  -7% -    6%) 0.540
                 AndHighLow      969.58      (5.6%)      963.88      (4.8%)   -0.6% ( -10% -   10%) 0.720
                LowSpanNear       25.23      (4.2%)       25.11      (2.9%)   -0.5% (  -7% -    6%) 0.666
                MedSpanNear       26.41      (2.4%)       26.33      (1.5%)   -0.3% (  -4% -    3%) 0.610
       HighIntervalsOrdered       37.09      (1.9%)       36.98      (2.4%)   -0.3% (  -4% -    4%) 0.669
              OrHighNotHigh      679.06      (4.3%)      677.17      (5.8%)   -0.3% (  -9% -   10%) 0.863
               HighSpanNear       32.19      (2.2%)       32.14      (2.1%)   -0.2% (  -4% -    4%) 0.822
                     IntNRQ      322.43      (2.0%)      322.04      (2.5%)   -0.1% (  -4% -    4%) 0.865
      BrowseMonthSSDVFacets       32.22      (1.7%)       32.25      (1.5%)    0.1% (  -3% -    3%) 0.896
            LowSloppyPhrase       39.45      (2.6%)       39.48      (2.4%)    0.1% (  -4% -    5%) 0.921
   BrowseDayOfYearSSDVFacets       28.20      (5.4%)       28.23      (5.2%)    0.1% (  -9% -   11%) 0.947
           HighSloppyPhrase       56.95      (2.4%)       57.03      (2.4%)    0.1% (  -4% -    4%) 0.846
                   PKLookup      217.45      (3.9%)      217.78      (4.2%)    0.2% (  -7% -    8%) 0.906
                    LowTerm     1614.00      (3.7%)     1616.52      (4.3%)    0.2% (  -7% -    8%) 0.902
            MedSloppyPhrase      335.24      (2.8%)      336.50      (2.7%)    0.4% (  -4% -    6%) 0.665
                  MedPhrase      257.34      (2.7%)      258.59      (1.9%)    0.5% (  -4% -    5%) 0.515
                 HighPhrase      100.07      (2.1%)      100.66      (1.7%)    0.6% (  -3% -    4%) 0.332
   BrowseDayOfYearTaxoFacets       11.20      (2.8%)       11.28      (2.5%)    0.7% (  -4% -    6%) 0.410
      BrowseMonthTaxoFacets       13.07      (2.4%)       13.17      (1.9%)    0.7% (  -3% -    5%) 0.283
       BrowseDateTaxoFacets       11.18      (2.9%)       11.27      (2.5%)    0.8% (  -4% -    6%) 0.369
                   Wildcard       55.50      (4.6%)       56.08      (2.9%)    1.0% (  -6% -    8%) 0.391
                  LowPhrase      501.30      (3.5%)      506.61      (3.2%)    1.1% (  -5% -    8%) 0.319
                    Prefix3      107.90      (6.5%)      109.16      (3.9%)    1.2% (  -8% -   12%) 0.491
                    Respell       73.30      (3.3%)       74.17      (2.6%)    1.2% (  -4% -    7%) 0.210
               OrNotHighMed      625.05      (4.3%)      634.75      (4.9%)    1.6% (  -7% -   11%) 0.289
                     Fuzzy2       67.34     (18.7%)       68.92     (16.8%)    2.3% ( -27% -   46%) 0.677
                   HighTerm     1559.83      (4.6%)     1608.90      (5.3%)    3.1% (  -6% -   13%) 0.044
                     Fuzzy1       74.41     (17.1%)       77.02     (13.2%)    3.5% ( -22% -   40%) 0.467
                  OrHighMed      176.89      (4.0%)      192.17      (2.7%)    8.6% (   1% -   16%) 0.000
                  OrHighLow      179.14      (3.0%)      634.97     (16.3%)  254.5% ( 228% -  282%) 0.000
   ```
   ```
   full wikimedium5m run 3
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy2       78.85     (17.1%)       74.79     (15.3%)   -5.1% ( -32% -   32%) 0.315
                     Fuzzy1       73.72     (12.3%)       70.14      (9.6%)   -4.9% ( -23% -   19%) 0.164
                  OrHighMed      218.87      (3.8%)      213.12      (3.9%)   -2.6% (  -9% -    5%) 0.031
              OrHighNotHigh      710.58      (5.0%)      693.73      (4.9%)   -2.4% ( -11% -    7%) 0.130
               OrHighNotLow      766.45      (7.0%)      752.36      (5.4%)   -1.8% ( -13% -   11%) 0.351
               OrHighNotMed      788.49      (4.6%)      779.76      (4.0%)   -1.1% (  -9% -    7%) 0.415
                MedSpanNear      432.51      (2.6%)      428.61      (2.9%)   -0.9% (  -6% -    4%) 0.301
                 HighPhrase      328.27      (2.6%)      325.47      (3.1%)   -0.9% (  -6% -    4%) 0.338
                    MedTerm     1537.24      (3.9%)     1525.49      (3.9%)   -0.8% (  -8% -    7%) 0.537
                   PKLookup      224.01      (3.4%)      222.35      (3.2%)   -0.7% (  -7% -    6%) 0.478
                   HighTerm     1852.48      (6.1%)     1839.68      (6.9%)   -0.7% ( -12% -   13%) 0.737
               OrNotHighLow      872.06      (4.3%)      866.35      (3.3%)   -0.7% (  -7% -    7%) 0.589
              OrNotHighHigh      696.91      (4.9%)      694.25      (5.3%)   -0.4% ( -10% -   10%) 0.814
                 AndHighMed      399.43      (3.7%)      398.38      (3.4%)   -0.3% (  -7% -    7%) 0.818
      BrowseMonthTaxoFacets       13.35      (2.5%)       13.33      (2.8%)   -0.1% (  -5% -    5%) 0.891
      BrowseMonthSSDVFacets       31.99      (2.2%)       31.97      (2.3%)   -0.1% (  -4% -    4%) 0.917
       HighIntervalsOrdered       56.92      (1.7%)       56.89      (1.5%)   -0.1% (  -3% -    3%) 0.916
                  MedPhrase      421.85      (2.6%)      421.64      (2.4%)   -0.1% (  -4% -    5%) 0.949
                LowSpanNear      215.84      (1.5%)      215.81      (1.9%)   -0.0% (  -3% -    3%) 0.975
   BrowseDayOfYearTaxoFacets       11.13      (3.0%)       11.13      (3.2%)   -0.0% (  -6% -    6%) 0.992
   BrowseDayOfYearSSDVFacets       27.51      (8.3%)       27.52      (8.1%)    0.0% ( -15% -   17%) 0.994
               HighSpanNear       16.99      (2.2%)       16.99      (2.1%)    0.0% (  -4% -    4%) 0.968
       BrowseDateTaxoFacets       11.11      (3.0%)       11.11      (3.3%)    0.0% (  -6% -    6%) 0.977
                   Wildcard      259.96      (2.3%)      260.11      (2.7%)    0.1% (  -4% -    5%) 0.943
       HighTermTitleBDVSort      216.56      (6.9%)      216.79      (7.9%)    0.1% ( -13% -   15%) 0.964
            LowSloppyPhrase       36.16      (3.5%)       36.20      (3.8%)    0.1% (  -6% -    7%) 0.922
                    LowTerm     1653.62      (6.1%)     1656.23      (4.8%)    0.2% ( -10% -   11%) 0.928
                 TermDTSort      236.21     (14.9%)      236.69     (14.7%)    0.2% ( -25% -   34%) 0.965
               OrNotHighMed      738.85      (3.6%)      741.27      (4.7%)    0.3% (  -7% -    9%) 0.806
                     IntNRQ      122.68      (1.1%)      123.17      (0.8%)    0.4% (  -1% -    2%) 0.210
                    Respell       75.86      (2.4%)       76.22      (2.0%)    0.5% (  -3% -    5%) 0.505
           HighSloppyPhrase       80.85      (3.7%)       81.25      (4.6%)    0.5% (  -7% -    9%) 0.708
            MedSloppyPhrase       31.20      (3.5%)       31.39      (4.3%)    0.6% (  -6% -    8%) 0.628
          HighTermMonthSort      396.29      (8.2%)      398.90      (9.3%)    0.7% ( -15% -   19%) 0.812
                    Prefix3      393.10      (2.7%)      396.20      (2.5%)    0.8% (  -4% -    6%) 0.339
                AndHighHigh      105.61      (3.7%)      106.69      (4.0%)    1.0% (  -6% -    9%) 0.399
                  LowPhrase       61.52      (2.1%)       62.17      (3.2%)    1.1% (  -4% -    6%) 0.221
                 AndHighLow      915.63      (4.3%)      928.98      (3.1%)    1.5% (  -5% -    9%) 0.217
      HighTermDayOfYearSort      216.71     (14.0%)      220.00     (15.9%)    1.5% ( -24% -   36%) 0.749
                  OrHighLow      535.18      (7.4%)      571.87      (5.8%)    6.9% (  -5% -   21%) 0.001
                 OrHighHigh       51.30      (2.8%)       56.55      (2.7%)   10.2% (   4% -   16%) 0.000
   ```
   
   So far the implementation seems to be similar to the baseline WANDScorer, with the surprising occasional huge speed up or `OrHighLow`. Hopefully this is not caused by a bug :D . I think this performance characteristics makes sense, as the low frequency / high score contribution term would drive the iteration, and a big window size would cause more docs to be pruned quickly if it can't be competitive from their maxScores.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org