You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/10 03:37:45 UTC
[GitHub] [lucene] zacharymorn commented on pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
zacharymorn commented on pull request #113:
URL: https://github.com/apache/lucene/pull/113#issuecomment-836122884
Hi @jpountz, I've ported your changes to this BulkScorer implementation as well, and run both 5 OrMed as well as full wikimedium5m benchmark:
```
OrMedMedMedMedMed run 1
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrMedMedMedMedMed 40.90 (8.5%) 39.37 (6.8%) -3.7% ( -17% - 12%) 0.126
PKLookup 228.21 (1.9%) 223.87 (2.2%) -1.9% ( -5% - 2%) 0.004
```
```
OrMedMedMedMedMed run 2
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrMedMedMedMedMed 39.72 (5.0%) 38.01 (7.4%) -4.3% ( -15% - 8%) 0.030
PKLookup 226.45 (2.1%) 223.28 (2.3%) -1.4% ( -5% - 3%) 0.048
```
```
OrMedMedMedMedMed run 3
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
PKLookup 226.41 (3.3%) 222.43 (2.3%) -1.8% ( -7% - 3%) 0.052
OrMedMedMedMedMed 38.83 (6.7%) 39.27 (7.1%) 1.1% ( -11% - 15%) 0.600
```
```
full wikimedium5m run 1
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Wildcard 376.63 (5.8%) 360.47 (6.2%) -4.3% ( -15% - 8%) 0.024
OrNotHighHigh 745.74 (4.5%) 730.51 (5.7%) -2.0% ( -11% - 8%) 0.208
Fuzzy2 40.89 (6.0%) 40.20 (8.5%) -1.7% ( -15% - 13%) 0.465
HighTermDayOfYearSort 354.09 (16.6%) 348.53 (13.2%) -1.6% ( -26% - 33%) 0.740
BrowseMonthSSDVFacets 31.93 (3.0%) 31.50 (6.5%) -1.3% ( -10% - 8%) 0.402
LowTerm 1978.09 (5.1%) 1956.82 (5.3%) -1.1% ( -10% - 9%) 0.514
IntNRQ 194.54 (3.6%) 193.05 (4.2%) -0.8% ( -8% - 7%) 0.537
HighTermMonthSort 330.71 (10.6%) 328.18 (9.7%) -0.8% ( -19% - 21%) 0.812
OrHighNotLow 806.97 (6.4%) 801.14 (5.6%) -0.7% ( -11% - 11%) 0.702
BrowseDayOfYearSSDVFacets 28.57 (1.7%) 28.39 (2.0%) -0.6% ( -4% - 3%) 0.294
AndHighHigh 70.54 (3.8%) 70.12 (4.6%) -0.6% ( -8% - 8%) 0.657
Respell 78.30 (2.0%) 77.93 (2.1%) -0.5% ( -4% - 3%) 0.463
OrHighNotHigh 772.33 (5.0%) 768.86 (5.8%) -0.4% ( -10% - 10%) 0.795
Prefix3 133.26 (7.3%) 132.68 (8.8%) -0.4% ( -15% - 16%) 0.865
HighTermTitleBDVSort 189.02 (17.9%) 188.23 (12.7%) -0.4% ( -26% - 36%) 0.932
MedSpanNear 129.28 (2.6%) 129.09 (3.1%) -0.1% ( -5% - 5%) 0.871
OrNotHighLow 900.87 (3.4%) 900.01 (3.7%) -0.1% ( -6% - 7%) 0.932
LowPhrase 61.05 (2.7%) 61.00 (3.1%) -0.1% ( -5% - 5%) 0.918
HighSpanNear 96.65 (3.2%) 96.63 (3.3%) -0.0% ( -6% - 6%) 0.990
Fuzzy1 67.13 (6.9%) 67.15 (6.6%) 0.0% ( -12% - 14%) 0.988
OrHighNotMed 811.67 (4.9%) 812.18 (5.6%) 0.1% ( -9% - 11%) 0.969
BrowseMonthTaxoFacets 13.21 (2.8%) 13.22 (2.8%) 0.1% ( -5% - 5%) 0.941
HighPhrase 34.18 (3.1%) 34.21 (3.3%) 0.1% ( -6% - 6%) 0.939
AndHighLow 905.10 (4.0%) 905.96 (5.0%) 0.1% ( -8% - 9%) 0.947
MedPhrase 87.90 (2.8%) 88.10 (3.0%) 0.2% ( -5% - 6%) 0.811
BrowseDateTaxoFacets 11.06 (3.9%) 11.09 (3.4%) 0.3% ( -6% - 7%) 0.811
BrowseDayOfYearTaxoFacets 11.05 (3.8%) 11.08 (3.4%) 0.3% ( -6% - 7%) 0.801
MedSloppyPhrase 152.46 (3.1%) 152.89 (2.7%) 0.3% ( -5% - 6%) 0.757
PKLookup 215.89 (2.8%) 216.86 (3.8%) 0.5% ( -5% - 7%) 0.667
TermDTSort 436.33 (15.6%) 438.31 (13.8%) 0.5% ( -25% - 35%) 0.922
LowSpanNear 119.90 (2.4%) 120.46 (2.3%) 0.5% ( -4% - 5%) 0.533
HighSloppyPhrase 28.82 (3.9%) 28.99 (2.8%) 0.6% ( -5% - 7%) 0.586
AndHighMed 475.36 (5.6%) 478.26 (5.8%) 0.6% ( -10% - 12%) 0.735
LowSloppyPhrase 388.99 (3.4%) 392.32 (2.9%) 0.9% ( -5% - 7%) 0.387
OrNotHighMed 774.61 (6.6%) 781.75 (5.6%) 0.9% ( -10% - 14%) 0.633
HighTerm 1268.49 (5.6%) 1290.00 (5.6%) 1.7% ( -9% - 13%) 0.340
HighIntervalsOrdered 417.04 (3.1%) 425.09 (2.9%) 1.9% ( -3% - 8%) 0.043
MedTerm 1583.25 (5.4%) 1627.50 (5.5%) 2.8% ( -7% - 14%) 0.107
OrHighHigh 61.28 (3.6%) 64.46 (3.0%) 5.2% ( -1% - 12%) 0.000
OrHighMed 79.13 (2.9%) 85.68 (3.3%) 8.3% ( 1% - 14%) 0.000
OrHighLow 231.58 (4.7%) 683.73 (16.0%) 195.2% ( 166% - 226%) 0.000
```
```
full wikimedium5m run 2
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrHighHigh 97.84 (2.7%) 78.42 (2.1%) -19.8% ( -24% - -15%) 0.000
HighTermTitleBDVSort 223.86 (17.8%) 217.70 (16.4%) -2.8% ( -31% - 38%) 0.611
OrNotHighLow 964.32 (2.6%) 945.18 (6.0%) -2.0% ( -10% - 6%) 0.175
OrHighNotLow 814.26 (5.8%) 799.46 (5.7%) -1.8% ( -12% - 10%) 0.316
HighTermMonthSort 342.78 (14.3%) 338.52 (15.6%) -1.2% ( -27% - 33%) 0.793
HighTermDayOfYearSort 259.90 (13.7%) 257.22 (13.8%) -1.0% ( -25% - 30%) 0.812
TermDTSort 234.69 (10.9%) 232.30 (12.3%) -1.0% ( -21% - 24%) 0.782
AndHighHigh 93.13 (3.0%) 92.19 (3.5%) -1.0% ( -7% - 5%) 0.326
MedTerm 1410.12 (3.9%) 1398.22 (2.4%) -0.8% ( -6% - 5%) 0.408
OrNotHighHigh 679.95 (6.4%) 674.81 (6.3%) -0.8% ( -12% - 12%) 0.706
OrHighNotMed 744.68 (4.4%) 739.05 (5.8%) -0.8% ( -10% - 9%) 0.644
AndHighMed 451.76 (3.8%) 448.59 (3.4%) -0.7% ( -7% - 6%) 0.540
AndHighLow 969.58 (5.6%) 963.88 (4.8%) -0.6% ( -10% - 10%) 0.720
LowSpanNear 25.23 (4.2%) 25.11 (2.9%) -0.5% ( -7% - 6%) 0.666
MedSpanNear 26.41 (2.4%) 26.33 (1.5%) -0.3% ( -4% - 3%) 0.610
HighIntervalsOrdered 37.09 (1.9%) 36.98 (2.4%) -0.3% ( -4% - 4%) 0.669
OrHighNotHigh 679.06 (4.3%) 677.17 (5.8%) -0.3% ( -9% - 10%) 0.863
HighSpanNear 32.19 (2.2%) 32.14 (2.1%) -0.2% ( -4% - 4%) 0.822
IntNRQ 322.43 (2.0%) 322.04 (2.5%) -0.1% ( -4% - 4%) 0.865
BrowseMonthSSDVFacets 32.22 (1.7%) 32.25 (1.5%) 0.1% ( -3% - 3%) 0.896
LowSloppyPhrase 39.45 (2.6%) 39.48 (2.4%) 0.1% ( -4% - 5%) 0.921
BrowseDayOfYearSSDVFacets 28.20 (5.4%) 28.23 (5.2%) 0.1% ( -9% - 11%) 0.947
HighSloppyPhrase 56.95 (2.4%) 57.03 (2.4%) 0.1% ( -4% - 4%) 0.846
PKLookup 217.45 (3.9%) 217.78 (4.2%) 0.2% ( -7% - 8%) 0.906
LowTerm 1614.00 (3.7%) 1616.52 (4.3%) 0.2% ( -7% - 8%) 0.902
MedSloppyPhrase 335.24 (2.8%) 336.50 (2.7%) 0.4% ( -4% - 6%) 0.665
MedPhrase 257.34 (2.7%) 258.59 (1.9%) 0.5% ( -4% - 5%) 0.515
HighPhrase 100.07 (2.1%) 100.66 (1.7%) 0.6% ( -3% - 4%) 0.332
BrowseDayOfYearTaxoFacets 11.20 (2.8%) 11.28 (2.5%) 0.7% ( -4% - 6%) 0.410
BrowseMonthTaxoFacets 13.07 (2.4%) 13.17 (1.9%) 0.7% ( -3% - 5%) 0.283
BrowseDateTaxoFacets 11.18 (2.9%) 11.27 (2.5%) 0.8% ( -4% - 6%) 0.369
Wildcard 55.50 (4.6%) 56.08 (2.9%) 1.0% ( -6% - 8%) 0.391
LowPhrase 501.30 (3.5%) 506.61 (3.2%) 1.1% ( -5% - 8%) 0.319
Prefix3 107.90 (6.5%) 109.16 (3.9%) 1.2% ( -8% - 12%) 0.491
Respell 73.30 (3.3%) 74.17 (2.6%) 1.2% ( -4% - 7%) 0.210
OrNotHighMed 625.05 (4.3%) 634.75 (4.9%) 1.6% ( -7% - 11%) 0.289
Fuzzy2 67.34 (18.7%) 68.92 (16.8%) 2.3% ( -27% - 46%) 0.677
HighTerm 1559.83 (4.6%) 1608.90 (5.3%) 3.1% ( -6% - 13%) 0.044
Fuzzy1 74.41 (17.1%) 77.02 (13.2%) 3.5% ( -22% - 40%) 0.467
OrHighMed 176.89 (4.0%) 192.17 (2.7%) 8.6% ( 1% - 16%) 0.000
OrHighLow 179.14 (3.0%) 634.97 (16.3%) 254.5% ( 228% - 282%) 0.000
```
```
full wikimedium5m run 3
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy2 78.85 (17.1%) 74.79 (15.3%) -5.1% ( -32% - 32%) 0.315
Fuzzy1 73.72 (12.3%) 70.14 (9.6%) -4.9% ( -23% - 19%) 0.164
OrHighMed 218.87 (3.8%) 213.12 (3.9%) -2.6% ( -9% - 5%) 0.031
OrHighNotHigh 710.58 (5.0%) 693.73 (4.9%) -2.4% ( -11% - 7%) 0.130
OrHighNotLow 766.45 (7.0%) 752.36 (5.4%) -1.8% ( -13% - 11%) 0.351
OrHighNotMed 788.49 (4.6%) 779.76 (4.0%) -1.1% ( -9% - 7%) 0.415
MedSpanNear 432.51 (2.6%) 428.61 (2.9%) -0.9% ( -6% - 4%) 0.301
HighPhrase 328.27 (2.6%) 325.47 (3.1%) -0.9% ( -6% - 4%) 0.338
MedTerm 1537.24 (3.9%) 1525.49 (3.9%) -0.8% ( -8% - 7%) 0.537
PKLookup 224.01 (3.4%) 222.35 (3.2%) -0.7% ( -7% - 6%) 0.478
HighTerm 1852.48 (6.1%) 1839.68 (6.9%) -0.7% ( -12% - 13%) 0.737
OrNotHighLow 872.06 (4.3%) 866.35 (3.3%) -0.7% ( -7% - 7%) 0.589
OrNotHighHigh 696.91 (4.9%) 694.25 (5.3%) -0.4% ( -10% - 10%) 0.814
AndHighMed 399.43 (3.7%) 398.38 (3.4%) -0.3% ( -7% - 7%) 0.818
BrowseMonthTaxoFacets 13.35 (2.5%) 13.33 (2.8%) -0.1% ( -5% - 5%) 0.891
BrowseMonthSSDVFacets 31.99 (2.2%) 31.97 (2.3%) -0.1% ( -4% - 4%) 0.917
HighIntervalsOrdered 56.92 (1.7%) 56.89 (1.5%) -0.1% ( -3% - 3%) 0.916
MedPhrase 421.85 (2.6%) 421.64 (2.4%) -0.1% ( -4% - 5%) 0.949
LowSpanNear 215.84 (1.5%) 215.81 (1.9%) -0.0% ( -3% - 3%) 0.975
BrowseDayOfYearTaxoFacets 11.13 (3.0%) 11.13 (3.2%) -0.0% ( -6% - 6%) 0.992
BrowseDayOfYearSSDVFacets 27.51 (8.3%) 27.52 (8.1%) 0.0% ( -15% - 17%) 0.994
HighSpanNear 16.99 (2.2%) 16.99 (2.1%) 0.0% ( -4% - 4%) 0.968
BrowseDateTaxoFacets 11.11 (3.0%) 11.11 (3.3%) 0.0% ( -6% - 6%) 0.977
Wildcard 259.96 (2.3%) 260.11 (2.7%) 0.1% ( -4% - 5%) 0.943
HighTermTitleBDVSort 216.56 (6.9%) 216.79 (7.9%) 0.1% ( -13% - 15%) 0.964
LowSloppyPhrase 36.16 (3.5%) 36.20 (3.8%) 0.1% ( -6% - 7%) 0.922
LowTerm 1653.62 (6.1%) 1656.23 (4.8%) 0.2% ( -10% - 11%) 0.928
TermDTSort 236.21 (14.9%) 236.69 (14.7%) 0.2% ( -25% - 34%) 0.965
OrNotHighMed 738.85 (3.6%) 741.27 (4.7%) 0.3% ( -7% - 9%) 0.806
IntNRQ 122.68 (1.1%) 123.17 (0.8%) 0.4% ( -1% - 2%) 0.210
Respell 75.86 (2.4%) 76.22 (2.0%) 0.5% ( -3% - 5%) 0.505
HighSloppyPhrase 80.85 (3.7%) 81.25 (4.6%) 0.5% ( -7% - 9%) 0.708
MedSloppyPhrase 31.20 (3.5%) 31.39 (4.3%) 0.6% ( -6% - 8%) 0.628
HighTermMonthSort 396.29 (8.2%) 398.90 (9.3%) 0.7% ( -15% - 19%) 0.812
Prefix3 393.10 (2.7%) 396.20 (2.5%) 0.8% ( -4% - 6%) 0.339
AndHighHigh 105.61 (3.7%) 106.69 (4.0%) 1.0% ( -6% - 9%) 0.399
LowPhrase 61.52 (2.1%) 62.17 (3.2%) 1.1% ( -4% - 6%) 0.221
AndHighLow 915.63 (4.3%) 928.98 (3.1%) 1.5% ( -5% - 9%) 0.217
HighTermDayOfYearSort 216.71 (14.0%) 220.00 (15.9%) 1.5% ( -24% - 36%) 0.749
OrHighLow 535.18 (7.4%) 571.87 (5.8%) 6.9% ( -5% - 21%) 0.001
OrHighHigh 51.30 (2.8%) 56.55 (2.7%) 10.2% ( 4% - 16%) 0.000
```
So far the implementation seems to be similar to the baseline WANDScorer, with the surprising occasional huge speed up or `OrHighLow`. Hopefully this is not caused by a bug :D . I think this performance characteristics makes sense, as the low frequency / high score contribution term would drive the iteration, and a big window size would cause more docs to be pruned quickly if it can't be competitive from their maxScores.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org