You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/04/29 03:32:35 UTC
[GitHub] [lucene] zacharymorn opened a new pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
zacharymorn opened a new pull request #113:
URL: https://github.com/apache/lucene/pull/113
Implement BMM algorithm from "Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes" by Dimopoulos, Nepomnyachiy and Suel, using BulkScorer interface.
This BMM implementation passes all existing tests run by `./gradlew check` as well as luceneutil benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #113:
URL: https://github.com/apache/lucene/pull/113#issuecomment-840293637
Ran wikibigall for BMM BulkScorer implementation with window following the suggestions from https://github.com/apache/lucene/pull/101#issuecomment-837909869, and got the following results:
wikibigall run 1
```
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy2 45.83 (9.9%) 41.93 (19.0%) -8.5% ( -34% - 22%) 0.076
Fuzzy1 56.71 (8.2%) 51.99 (14.4%) -8.3% ( -28% - 15%) 0.025
AndHighOrMedMed 43.61 (3.2%) 43.13 (2.7%) -1.1% ( -6% - 4%) 0.246
BrowseMonthTaxoFacets 8.47 (6.8%) 8.43 (6.8%) -0.4% ( -13% - 14%) 0.841
Wildcard 59.90 (2.7%) 59.66 (3.0%) -0.4% ( -5% - 5%) 0.658
Respell 49.17 (2.6%) 49.05 (3.5%) -0.3% ( -6% - 5%) 0.789
PKLookup 205.25 (5.3%) 204.79 (4.4%) -0.2% ( -9% - 10%) 0.883
BrowseDayOfYearSSDVFacets 17.57 (2.2%) 17.55 (2.3%) -0.2% ( -4% - 4%) 0.816
IntervalsOrdered 1.94 (3.8%) 1.94 (3.9%) -0.1% ( -7% - 7%) 0.913
BrowseMonthSSDVFacets 19.16 (2.2%) 19.14 (2.4%) -0.1% ( -4% - 4%) 0.866
BrowseDayOfYearTaxoFacets 7.32 (6.7%) 7.31 (6.8%) -0.1% ( -12% - 14%) 0.963
TermDateFacets 8.56 (6.5%) 8.56 (6.7%) -0.1% ( -12% - 14%) 0.974
Phrase 24.65 (3.1%) 24.64 (3.3%) -0.1% ( -6% - 6%) 0.948
BrowseDateTaxoFacets 7.64 (6.9%) 7.63 (7.0%) -0.0% ( -13% - 14%) 0.985
AndHighMed 44.58 (4.1%) 44.59 (3.8%) 0.0% ( -7% - 8%) 0.984
AndHighHigh 22.83 (4.3%) 22.84 (3.7%) 0.0% ( -7% - 8%) 0.980
AndMedOrHighHigh 29.32 (3.5%) 29.33 (4.0%) 0.0% ( -7% - 7%) 0.976
IntNRQ 122.51 (17.1%) 122.58 (17.2%) 0.1% ( -29% - 41%) 0.992
SpanNear 4.82 (3.0%) 4.83 (2.9%) 0.2% ( -5% - 6%) 0.818
Prefix3 114.88 (6.0%) 115.25 (6.7%) 0.3% ( -11% - 13%) 0.873
SloppyPhrase 1.19 (9.8%) 1.20 (9.9%) 0.3% ( -17% - 22%) 0.913
TermBGroup1M1P 24.69 (4.2%) 24.80 (4.1%) 0.4% ( -7% - 9%) 0.745
TermGroup1M 15.52 (2.8%) 15.60 (3.3%) 0.5% ( -5% - 6%) 0.587
Term 1018.96 (6.7%) 1026.53 (7.1%) 0.7% ( -12% - 15%) 0.734
TermBGroup1M 25.25 (2.9%) 25.44 (3.5%) 0.8% ( -5% - 7%) 0.454
VectorSearch 1014.64 (6.7%) 1023.16 (6.8%) 0.8% ( -11% - 15%) 0.695
TermGroup100 11.41 (4.0%) 11.51 (4.0%) 0.8% ( -6% - 9%) 0.506
TermMonthSort 63.33 (14.0%) 63.90 (14.0%) 0.9% ( -23% - 33%) 0.838
TermGroup10K 18.49 (3.0%) 18.74 (3.7%) 1.4% ( -5% - 8%) 0.196
TermTitleSort 179.96 (13.4%) 182.73 (14.1%) 1.5% ( -22% - 33%) 0.723
TermDTSort 57.89 (11.1%) 59.22 (14.1%) 2.3% ( -20% - 30%) 0.569
TermDayOfYearSort 55.40 (8.0%) 57.06 (11.4%) 3.0% ( -15% - 24%) 0.335
OrHighHigh 74.09 (3.6%) 79.35 (4.4%) 7.1% ( 0% - 15%) 0.000
OrHighMed 45.50 (3.8%) 56.92 (4.1%) 25.1% ( 16% - 34%) 0.000
```
wikibigall run 2
```
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy2 52.37 (9.0%) 47.63 (11.0%) -9.0% ( -26% - 12%) 0.004
Fuzzy1 60.48 (8.9%) 55.26 (10.1%) -8.6% ( -25% - 11%) 0.004
AndHighOrMedMed 24.96 (2.1%) 24.60 (3.6%) -1.4% ( -7% - 4%) 0.121
VectorSearch 819.06 (4.6%) 812.87 (4.9%) -0.8% ( -9% - 9%) 0.617
Respell 42.97 (3.9%) 42.67 (4.4%) -0.7% ( -8% - 7%) 0.599
BrowseMonthSSDVFacets 19.34 (1.3%) 19.22 (2.3%) -0.6% ( -4% - 2%) 0.290
AndMedOrHighHigh 25.32 (3.5%) 25.16 (3.8%) -0.6% ( -7% - 6%) 0.595
Phrase 21.60 (3.6%) 21.47 (3.8%) -0.6% ( -7% - 7%) 0.603
IntervalsOrdered 0.83 (5.4%) 0.82 (5.0%) -0.5% ( -10% - 10%) 0.760
TermGroup100 23.86 (4.3%) 23.75 (4.4%) -0.5% ( -8% - 8%) 0.724
TermDateFacets 10.44 (10.0%) 10.41 (9.4%) -0.4% ( -17% - 21%) 0.903
PKLookup 210.67 (5.9%) 209.94 (4.9%) -0.3% ( -10% - 11%) 0.841
TermGroup1M 10.60 (4.4%) 10.56 (4.7%) -0.3% ( -9% - 9%) 0.811
TermDayOfYearSort 47.25 (10.1%) 47.16 (10.3%) -0.2% ( -18% - 22%) 0.952
TermBGroup1M1P 15.66 (5.3%) 15.63 (5.1%) -0.2% ( -10% - 10%) 0.920
BrowseMonthTaxoFacets 8.22 (7.9%) 8.20 (7.7%) -0.2% ( -14% - 16%) 0.947
IntNRQ 271.37 (2.8%) 271.06 (3.1%) -0.1% ( -5% - 5%) 0.904
SpanNear 10.34 (2.1%) 10.33 (2.1%) -0.1% ( -4% - 4%) 0.895
TermGroup10K 10.47 (4.3%) 10.47 (4.2%) -0.0% ( -8% - 8%) 0.975
BrowseDayOfYearTaxoFacets 7.10 (7.7%) 7.09 (7.4%) -0.0% ( -14% - 16%) 0.986
BrowseDateTaxoFacets 7.41 (8.1%) 7.40 (7.7%) -0.0% ( -14% - 17%) 0.989
SloppyPhrase 5.58 (8.3%) 5.58 (7.7%) -0.0% ( -14% - 17%) 0.991
TermBGroup1M 25.52 (4.3%) 25.56 (4.5%) 0.1% ( -8% - 9%) 0.921
AndHighMed 44.59 (2.3%) 44.68 (3.2%) 0.2% ( -5% - 5%) 0.812
Wildcard 88.60 (3.9%) 88.83 (3.4%) 0.3% ( -6% - 7%) 0.818
Prefix3 48.71 (5.2%) 48.91 (4.8%) 0.4% ( -9% - 10%) 0.792
BrowseDayOfYearSSDVFacets 17.53 (2.0%) 17.63 (3.4%) 0.6% ( -4% - 6%) 0.490
AndHighHigh 31.50 (3.5%) 31.80 (4.4%) 1.0% ( -6% - 9%) 0.450
TermDTSort 163.80 (14.4%) 165.54 (14.3%) 1.1% ( -24% - 34%) 0.814
TermTitleSort 113.02 (9.9%) 114.44 (13.0%) 1.3% ( -19% - 26%) 0.730
TermMonthSort 63.15 (10.4%) 64.13 (13.3%) 1.5% ( -20% - 28%) 0.683
OrHighHigh 17.12 (2.5%) 17.65 (3.1%) 3.1% ( -2% - 8%) 0.000
Term 1038.79 (8.0%) 1077.30 (7.9%) 3.7% ( -11% - 21%) 0.142
OrHighMed 45.90 (2.4%) 56.69 (3.7%) 23.5% ( 16% - 30%) 0.000
```
wikibigall run 3
```
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy1 59.46 (9.9%) 53.54 (12.3%) -10.0% ( -29% - 13%) 0.005
Fuzzy2 61.37 (11.5%) 60.03 (13.4%) -2.2% ( -24% - 25%) 0.580
AndHighOrMedMed 33.91 (3.7%) 33.59 (3.6%) -1.0% ( -7% - 6%) 0.405
Prefix3 29.87 (12.4%) 29.76 (12.4%) -0.4% ( -22% - 27%) 0.924
AndMedOrHighHigh 28.14 (3.7%) 28.05 (4.2%) -0.3% ( -7% - 7%) 0.817
IntNRQ 272.56 (3.9%) 271.78 (3.1%) -0.3% ( -7% - 7%) 0.800
SpanNear 1.21 (16.7%) 1.21 (16.6%) -0.2% ( -28% - 39%) 0.966
IntervalsOrdered 2.03 (4.5%) 2.03 (4.9%) -0.2% ( -9% - 9%) 0.895
Phrase 55.81 (5.4%) 55.72 (4.9%) -0.2% ( -9% - 10%) 0.920
BrowseDayOfYearSSDVFacets 17.45 (1.8%) 17.43 (1.7%) -0.1% ( -3% - 3%) 0.821
Wildcard 37.74 (9.2%) 37.71 (9.6%) -0.1% ( -17% - 20%) 0.983
BrowseMonthSSDVFacets 19.28 (1.3%) 19.28 (1.3%) 0.0% ( -2% - 2%) 0.997
Respell 40.21 (3.9%) 40.25 (5.0%) 0.1% ( -8% - 9%) 0.940
TermBGroup1M 25.59 (4.1%) 25.65 (3.9%) 0.3% ( -7% - 8%) 0.837
BrowseDayOfYearTaxoFacets 7.06 (6.3%) 7.09 (6.2%) 0.5% ( -11% - 13%) 0.780
BrowseMonthTaxoFacets 8.14 (6.9%) 8.19 (6.7%) 0.6% ( -12% - 15%) 0.796
BrowseDateTaxoFacets 7.36 (6.5%) 7.40 (6.5%) 0.6% ( -11% - 14%) 0.785
SloppyPhrase 5.39 (4.5%) 5.42 (4.7%) 0.6% ( -8% - 10%) 0.693
TermDTSort 80.69 (8.2%) 81.24 (8.2%) 0.7% ( -14% - 18%) 0.794
AndHighHigh 18.01 (4.4%) 18.15 (4.9%) 0.8% ( -8% - 10%) 0.601
AndHighMed 56.72 (4.7%) 57.19 (5.2%) 0.8% ( -8% - 11%) 0.598
TermDateFacets 8.23 (8.0%) 8.30 (8.2%) 0.8% ( -14% - 18%) 0.742
TermMonthSort 94.14 (13.2%) 94.94 (15.8%) 0.9% ( -24% - 34%) 0.852
TermGroup1M 22.76 (3.7%) 22.96 (3.8%) 0.9% ( -6% - 8%) 0.460
TermGroup10K 10.46 (4.9%) 10.60 (5.3%) 1.3% ( -8% - 12%) 0.422
TermTitleSort 117.75 (13.5%) 119.36 (16.3%) 1.4% ( -25% - 35%) 0.772
Term 1079.43 (8.3%) 1094.78 (7.3%) 1.4% ( -13% - 18%) 0.565
TermDayOfYearSort 45.89 (8.1%) 46.58 (8.7%) 1.5% ( -14% - 19%) 0.572
PKLookup 207.04 (5.6%) 210.17 (5.3%) 1.5% ( -8% - 13%) 0.381
TermGroup100 13.92 (5.2%) 14.13 (5.6%) 1.5% ( -8% - 13%) 0.372
VectorSearch 1046.61 (6.7%) 1067.51 (6.9%) 2.0% ( -10% - 16%) 0.351
TermBGroup1M1P 18.24 (6.3%) 18.61 (6.5%) 2.0% ( -10% - 15%) 0.319
OrHighHigh 74.99 (5.3%) 81.16 (5.5%) 8.2% ( -2% - 20%) 0.000
OrHighMed 45.89 (3.2%) 57.40 (5.6%) 25.1% ( 15% - 34%) 0.000
```
wikibigall run 4
```
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy1 57.19 (8.5%) 52.98 (6.5%) -7.4% ( -20% - 8%) 0.002
Fuzzy2 45.44 (11.4%) 42.42 (11.4%) -6.6% ( -26% - 18%) 0.066
AndHighOrMedMed 31.75 (3.8%) 31.40 (3.7%) -1.1% ( -8% - 6%) 0.365
AndMedOrHighHigh 25.00 (3.7%) 24.77 (3.5%) -0.9% ( -7% - 6%) 0.413
SloppyPhrase 5.40 (5.3%) 5.36 (5.4%) -0.7% ( -10% - 10%) 0.674
VectorSearch 807.06 (5.0%) 802.68 (5.6%) -0.5% ( -10% - 10%) 0.747
Prefix3 184.66 (4.6%) 183.98 (4.9%) -0.4% ( -9% - 9%) 0.807
AndHighHigh 65.28 (3.3%) 65.10 (4.0%) -0.3% ( -7% - 7%) 0.815
TermGroup10K 15.61 (3.7%) 15.58 (4.0%) -0.2% ( -7% - 7%) 0.874
TermGroup1M 10.44 (3.9%) 10.42 (4.8%) -0.2% ( -8% - 8%) 0.898
TermBGroup1M 21.55 (4.2%) 21.52 (5.1%) -0.1% ( -9% - 9%) 0.919
TermDateFacets 10.89 (9.2%) 10.87 (9.7%) -0.1% ( -17% - 20%) 0.972
IntervalsOrdered 0.81 (4.6%) 0.81 (4.5%) 0.0% ( -8% - 9%) 0.994
BrowseMonthSSDVFacets 18.99 (1.5%) 18.99 (1.2%) 0.0% ( -2% - 2%) 0.973
BrowseDateTaxoFacets 7.18 (7.2%) 7.18 (7.2%) 0.0% ( -13% - 15%) 0.985
SpanNear 2.08 (3.9%) 2.08 (3.7%) 0.1% ( -7% - 8%) 0.962
TermGroup100 13.72 (4.2%) 13.73 (5.0%) 0.1% ( -8% - 9%) 0.964
BrowseMonthTaxoFacets 8.00 (7.5%) 8.01 (7.7%) 0.1% ( -14% - 16%) 0.971
BrowseDayOfYearSSDVFacets 17.14 (2.5%) 17.16 (2.7%) 0.1% ( -4% - 5%) 0.910
TermDTSort 45.05 (8.1%) 45.10 (7.5%) 0.1% ( -14% - 17%) 0.964
TermTitleSort 88.10 (12.0%) 88.20 (11.2%) 0.1% ( -20% - 26%) 0.973
BrowseDayOfYearTaxoFacets 6.87 (6.7%) 6.88 (6.9%) 0.1% ( -12% - 14%) 0.949
AndHighMed 55.06 (3.4%) 55.15 (3.1%) 0.2% ( -6% - 6%) 0.883
TermMonthSort 61.57 (12.2%) 61.72 (11.3%) 0.3% ( -20% - 26%) 0.946
Phrase 53.70 (4.9%) 53.84 (4.5%) 0.3% ( -8% - 10%) 0.863
IntNRQ 263.07 (5.8%) 263.89 (4.7%) 0.3% ( -9% - 11%) 0.851
TermBGroup1M1P 30.16 (4.3%) 30.34 (4.8%) 0.6% ( -8% - 10%) 0.691
Respell 46.63 (5.3%) 46.91 (5.0%) 0.6% ( -9% - 11%) 0.712
Wildcard 70.44 (4.3%) 70.89 (3.9%) 0.6% ( -7% - 9%) 0.627
Term 950.32 (7.6%) 957.36 (7.9%) 0.7% ( -13% - 17%) 0.763
TermDayOfYearSort 46.36 (14.2%) 46.92 (15.2%) 1.2% ( -24% - 35%) 0.797
PKLookup 200.63 (7.4%) 203.46 (6.6%) 1.4% ( -11% - 16%) 0.524
OrHighHigh 10.27 (2.7%) 10.74 (3.0%) 4.5% ( -1% - 10%) 0.000
OrHighMed 43.96 (3.1%) 54.71 (4.3%) 24.4% ( 16% - 32%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a change in pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #113:
URL: https://github.com/apache/lucene/pull/113#discussion_r631568912
##########
File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch;
+
+import java.io.IOException;
+import java.util.*;
+import org.apache.lucene.util.Bits;
+
+/** BulkScorer that leverages BMM algorithm within interval (min, max) */
+public class BMMBulkScorer extends BulkScorer {
+ private List<Scorer> scorers;
+ private DisiWrapper[] allScorers;
+ private Weight weight;
+ private ScoreMode scoreMode;
+ private int scalingFactor;
+ private long cost;
+ private static final int FIXED_WINDOW_SIZE = 2048;
Review comment:
I also ran wikibigall for the above changes as well following the suggestions from https://github.com/apache/lucene/pull/101#issuecomment-837909869, and got the following results:
wikibigall run 1
```
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy1 51.08 (9.7%) 44.15 (11.5%) -13.6% ( -31% - 8%) 0.000
Fuzzy2 51.90 (12.0%) 48.39 (10.3%) -6.8% ( -25% - 17%) 0.056
TermDayOfYearSort 160.58 (12.1%) 156.99 (11.2%) -2.2% ( -22% - 23%) 0.542
TermMonthSort 58.61 (8.4%) 57.57 (8.0%) -1.8% ( -16% - 15%) 0.494
TermDTSort 104.08 (11.4%) 102.37 (9.4%) -1.6% ( -20% - 21%) 0.619
TermTitleSort 104.99 (8.3%) 103.29 (7.6%) -1.6% ( -16% - 15%) 0.519
AndHighOrMedMed 33.52 (3.1%) 33.02 (2.8%) -1.5% ( -7% - 4%) 0.114
AndHighHigh 18.08 (4.6%) 17.87 (4.1%) -1.1% ( -9% - 7%) 0.406
TermBGroup1M 14.14 (4.1%) 14.03 (3.5%) -0.8% ( -8% - 7%) 0.491
TermDateFacets 7.58 (5.5%) 7.53 (6.1%) -0.7% ( -11% - 11%) 0.714
Phrase 10.38 (1.8%) 10.32 (2.2%) -0.6% ( -4% - 3%) 0.359
AndHighMed 82.33 (4.0%) 81.87 (3.9%) -0.6% ( -8% - 7%) 0.655
SloppyPhrase 2.32 (8.3%) 2.31 (9.9%) -0.5% ( -17% - 19%) 0.855
TermGroup100 34.52 (3.7%) 34.36 (3.0%) -0.5% ( -6% - 6%) 0.650
TermBGroup1M1P 43.50 (3.8%) 43.30 (4.0%) -0.5% ( -7% - 7%) 0.700
AndMedOrHighHigh 25.62 (3.4%) 25.51 (3.1%) -0.4% ( -6% - 6%) 0.666
TermGroup1M 15.43 (3.0%) 15.37 (2.7%) -0.4% ( -5% - 5%) 0.668
VectorSearch 823.98 (1.9%) 820.96 (2.6%) -0.4% ( -4% - 4%) 0.616
PKLookup 210.69 (2.6%) 210.23 (2.5%) -0.2% ( -5% - 4%) 0.782
BrowseMonthSSDVFacets 18.90 (0.8%) 18.87 (0.9%) -0.2% ( -1% - 1%) 0.574
BrowseDayOfYearTaxoFacets 7.14 (5.3%) 7.14 (5.8%) -0.1% ( -10% - 11%) 0.943
Wildcard 38.83 (2.4%) 38.79 (2.5%) -0.1% ( -4% - 4%) 0.881
TermGroup10K 18.47 (2.9%) 18.45 (2.5%) -0.1% ( -5% - 5%) 0.910
SpanNear 4.76 (1.4%) 4.76 (1.3%) -0.1% ( -2% - 2%) 0.885
Prefix3 173.23 (6.8%) 173.13 (6.8%) -0.1% ( -12% - 14%) 0.978
BrowseDateTaxoFacets 7.46 (5.5%) 7.46 (6.1%) -0.1% ( -10% - 12%) 0.976
BrowseMonthTaxoFacets 8.27 (5.7%) 8.27 (6.4%) -0.0% ( -11% - 12%) 0.986
Respell 41.11 (2.8%) 41.12 (2.6%) 0.0% ( -5% - 5%) 0.972
BrowseDayOfYearSSDVFacets 17.13 (1.8%) 17.14 (1.7%) 0.1% ( -3% - 3%) 0.887
IntNRQ 267.98 (2.2%) 268.69 (2.5%) 0.3% ( -4% - 5%) 0.721
IntervalsOrdered 3.79 (2.1%) 3.81 (2.4%) 0.5% ( -3% - 5%) 0.448
Term 1046.89 (7.5%) 1067.03 (7.0%) 1.9% ( -11% - 17%) 0.401
OrHighMed 34.43 (3.2%) 37.66 (5.5%) 9.4% ( 0% - 18%) 0.000
OrHighHigh 16.93 (3.7%) 25.19 (4.6%) 48.8% ( 39% - 59%) 0.000
```
wikibigall run 2
```
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy1 50.87 (9.8%) 47.08 (13.8%) -7.5% ( -28% - 17%) 0.049
Fuzzy2 30.31 (5.6%) 28.38 (9.6%) -6.4% ( -20% - 9%) 0.011
TermMonthSort 59.51 (12.2%) 58.67 (9.4%) -1.4% ( -20% - 23%) 0.683
TermDTSort 172.78 (11.5%) 170.44 (10.1%) -1.4% ( -20% - 22%) 0.692
AndMedOrHighHigh 9.65 (3.1%) 9.55 (2.8%) -1.1% ( -6% - 4%) 0.233
TermTitleSort 59.25 (12.2%) 58.60 (9.8%) -1.1% ( -20% - 23%) 0.754
TermDateFacets 8.18 (7.5%) 8.13 (7.7%) -0.6% ( -14% - 15%) 0.789
Respell 46.60 (3.8%) 46.33 (3.6%) -0.6% ( -7% - 7%) 0.628
IntervalsOrdered 3.81 (2.7%) 3.80 (2.8%) -0.4% ( -5% - 5%) 0.674
AndHighOrMedMed 24.00 (3.0%) 23.94 (3.4%) -0.3% ( -6% - 6%) 0.792
AndHighMed 59.47 (3.1%) 59.34 (3.6%) -0.2% ( -6% - 6%) 0.837
TermGroup1M 22.27 (3.4%) 22.23 (3.5%) -0.1% ( -6% - 7%) 0.895
BrowseDateTaxoFacets 7.28 (7.7%) 7.27 (7.8%) -0.1% ( -14% - 16%) 0.956
BrowseDayOfYearTaxoFacets 6.97 (7.5%) 6.96 (7.5%) -0.1% ( -14% - 16%) 0.958
BrowseMonthTaxoFacets 8.08 (7.7%) 8.07 (7.9%) -0.1% ( -14% - 16%) 0.962
AndHighHigh 64.73 (2.8%) 64.67 (3.7%) -0.1% ( -6% - 6%) 0.921
Wildcard 70.06 (3.1%) 70.00 (3.2%) -0.1% ( -6% - 6%) 0.924
BrowseMonthSSDVFacets 18.76 (0.9%) 18.77 (0.9%) 0.0% ( -1% - 1%) 0.919
Phrase 20.88 (3.8%) 20.90 (3.2%) 0.1% ( -6% - 7%) 0.936
TermGroup10K 12.15 (3.7%) 12.16 (4.0%) 0.1% ( -7% - 8%) 0.931
TermBGroup1M1P 15.29 (5.1%) 15.31 (4.6%) 0.1% ( -9% - 10%) 0.936
Prefix3 32.94 (2.9%) 32.99 (2.9%) 0.1% ( -5% - 6%) 0.872
BrowseDayOfYearSSDVFacets 17.10 (1.7%) 17.13 (1.7%) 0.2% ( -3% - 3%) 0.768
TermGroup100 34.25 (3.8%) 34.34 (3.9%) 0.3% ( -7% - 8%) 0.829
SloppyPhrase 2.82 (7.5%) 2.83 (7.4%) 0.3% ( -13% - 16%) 0.900
TermDayOfYearSort 45.78 (11.8%) 45.93 (10.6%) 0.3% ( -19% - 25%) 0.926
SpanNear 10.00 (1.2%) 10.05 (1.2%) 0.4% ( -1% - 2%) 0.253
IntNRQ 108.69 (24.1%) 109.25 (23.7%) 0.5% ( -38% - 63%) 0.945
TermBGroup1M 11.95 (4.5%) 12.03 (5.2%) 0.7% ( -8% - 10%) 0.661
PKLookup 201.05 (6.0%) 203.48 (4.0%) 1.2% ( -8% - 11%) 0.451
Term 667.45 (5.8%) 683.87 (7.3%) 2.5% ( -10% - 16%) 0.240
VectorSearch 989.57 (5.4%) 1021.23 (5.0%) 3.2% ( -6% - 14%) 0.051
OrHighMed 58.35 (3.9%) 69.23 (5.8%) 18.6% ( 8% - 29%) 0.000
OrHighHigh 11.04 (3.4%) 16.84 (6.2%) 52.5% ( 41% - 64%) 0.000
```
wikibigall run 3
```
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy1 56.20 (11.1%) 49.60 (12.0%) -11.7% ( -31% - 12%) 0.001
TermMonthSort 61.43 (11.3%) 57.85 (14.1%) -5.8% ( -28% - 21%) 0.148
TermTitleSort 109.97 (11.2%) 103.85 (14.1%) -5.6% ( -27% - 22%) 0.167
TermDTSort 160.77 (10.8%) 151.92 (13.5%) -5.5% ( -26% - 21%) 0.156
TermDayOfYearSort 55.50 (7.1%) 52.92 (15.5%) -4.6% ( -25% - 19%) 0.222
TermGroup10K 10.30 (4.6%) 10.02 (7.4%) -2.7% ( -14% - 9%) 0.160
Term 1037.48 (5.2%) 1010.63 (7.6%) -2.6% ( -14% - 10%) 0.210
TermBGroup1M 21.54 (5.0%) 21.00 (7.4%) -2.5% ( -14% - 10%) 0.212
TermGroup100 18.89 (4.4%) 18.46 (7.8%) -2.3% ( -13% - 10%) 0.255
TermDateFacets 10.29 (9.2%) 10.11 (9.5%) -1.8% ( -18% - 18%) 0.536
TermBGroup1M1P 43.52 (4.9%) 42.88 (5.6%) -1.5% ( -11% - 9%) 0.373
Fuzzy2 56.25 (13.4%) 55.53 (12.5%) -1.3% ( -24% - 28%) 0.754
TermGroup1M 22.31 (3.8%) 22.04 (5.2%) -1.2% ( -9% - 8%) 0.389
AndMedOrHighHigh 28.60 (2.5%) 28.31 (2.7%) -1.0% ( -6% - 4%) 0.222
Phrase 59.81 (2.9%) 59.43 (3.1%) -0.6% ( -6% - 5%) 0.498
PKLookup 205.40 (3.8%) 204.10 (4.9%) -0.6% ( -8% - 8%) 0.648
VectorSearch 1033.68 (4.0%) 1027.88 (4.3%) -0.6% ( -8% - 8%) 0.670
BrowseDateTaxoFacets 7.27 (6.9%) 7.24 (7.0%) -0.4% ( -13% - 14%) 0.859
BrowseDayOfYearTaxoFacets 6.97 (6.6%) 6.94 (6.8%) -0.4% ( -12% - 13%) 0.854
SloppyPhrase 18.29 (2.0%) 18.22 (2.8%) -0.4% ( -5% - 4%) 0.612
BrowseMonthTaxoFacets 8.05 (6.9%) 8.02 (7.0%) -0.3% ( -13% - 14%) 0.891
AndHighOrMedMed 23.88 (2.7%) 23.83 (2.3%) -0.2% ( -5% - 4%) 0.774
IntervalsOrdered 3.83 (2.5%) 3.83 (2.6%) -0.1% ( -5% - 5%) 0.862
IntNRQ 123.08 (14.8%) 122.93 (15.0%) -0.1% ( -26% - 34%) 0.979
Wildcard 58.03 (2.7%) 57.97 (3.1%) -0.1% ( -5% - 5%) 0.901
BrowseDayOfYearSSDVFacets 16.93 (1.7%) 16.91 (1.5%) -0.1% ( -3% - 3%) 0.851
Prefix3 165.67 (10.5%) 165.54 (9.6%) -0.1% ( -18% - 22%) 0.980
SpanNear 4.76 (1.3%) 4.77 (1.0%) 0.0% ( -2% - 2%) 0.915
BrowseMonthSSDVFacets 18.78 (1.4%) 18.80 (1.3%) 0.1% ( -2% - 2%) 0.815
Respell 47.08 (4.1%) 47.19 (4.1%) 0.2% ( -7% - 8%) 0.851
AndHighHigh 17.36 (3.4%) 17.50 (3.1%) 0.8% ( -5% - 7%) 0.435
AndHighMed 32.21 (3.6%) 32.50 (3.2%) 0.9% ( -5% - 7%) 0.406
OrHighMed 33.59 (3.2%) 37.09 (3.8%) 10.4% ( 3% - 18%) 0.000
OrHighHigh 10.82 (3.7%) 17.08 (4.1%) 57.8% ( 48% - 68%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a change in pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #113:
URL: https://github.com/apache/lucene/pull/113#discussion_r629825600
##########
File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch;
+
+import java.io.IOException;
+import java.util.*;
+import org.apache.lucene.util.Bits;
+
+/** BulkScorer that leverages BMM algorithm within interval (min, max) */
+public class BMMBulkScorer extends BulkScorer {
+ private List<Scorer> scorers;
+ private DisiWrapper[] allScorers;
+ private Weight weight;
+ private ScoreMode scoreMode;
+ private int scalingFactor;
+ private long cost;
+ private static final int FIXED_WINDOW_SIZE = 2048;
Review comment:
I've pushed this change here https://github.com/zacharymorn/lucene/commit/3bcdbb31a7d55b00cb53e4be40a4adc93b9f30db and the corresponding benchmark results are available in the git commit message
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #113:
URL: https://github.com/apache/lucene/pull/113#issuecomment-836122884
Hi @jpountz, I've ported your changes to this BulkScorer implementation as well, and run both 5 OrMed as well as full wikimedium5m benchmark:
```
OrMedMedMedMedMed run 1
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrMedMedMedMedMed 40.90 (8.5%) 39.37 (6.8%) -3.7% ( -17% - 12%) 0.126
PKLookup 228.21 (1.9%) 223.87 (2.2%) -1.9% ( -5% - 2%) 0.004
```
```
OrMedMedMedMedMed run 2
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrMedMedMedMedMed 39.72 (5.0%) 38.01 (7.4%) -4.3% ( -15% - 8%) 0.030
PKLookup 226.45 (2.1%) 223.28 (2.3%) -1.4% ( -5% - 3%) 0.048
```
```
OrMedMedMedMedMed run 3
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
PKLookup 226.41 (3.3%) 222.43 (2.3%) -1.8% ( -7% - 3%) 0.052
OrMedMedMedMedMed 38.83 (6.7%) 39.27 (7.1%) 1.1% ( -11% - 15%) 0.600
```
```
full wikimedium5m run 1
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Wildcard 376.63 (5.8%) 360.47 (6.2%) -4.3% ( -15% - 8%) 0.024
OrNotHighHigh 745.74 (4.5%) 730.51 (5.7%) -2.0% ( -11% - 8%) 0.208
Fuzzy2 40.89 (6.0%) 40.20 (8.5%) -1.7% ( -15% - 13%) 0.465
HighTermDayOfYearSort 354.09 (16.6%) 348.53 (13.2%) -1.6% ( -26% - 33%) 0.740
BrowseMonthSSDVFacets 31.93 (3.0%) 31.50 (6.5%) -1.3% ( -10% - 8%) 0.402
LowTerm 1978.09 (5.1%) 1956.82 (5.3%) -1.1% ( -10% - 9%) 0.514
IntNRQ 194.54 (3.6%) 193.05 (4.2%) -0.8% ( -8% - 7%) 0.537
HighTermMonthSort 330.71 (10.6%) 328.18 (9.7%) -0.8% ( -19% - 21%) 0.812
OrHighNotLow 806.97 (6.4%) 801.14 (5.6%) -0.7% ( -11% - 11%) 0.702
BrowseDayOfYearSSDVFacets 28.57 (1.7%) 28.39 (2.0%) -0.6% ( -4% - 3%) 0.294
AndHighHigh 70.54 (3.8%) 70.12 (4.6%) -0.6% ( -8% - 8%) 0.657
Respell 78.30 (2.0%) 77.93 (2.1%) -0.5% ( -4% - 3%) 0.463
OrHighNotHigh 772.33 (5.0%) 768.86 (5.8%) -0.4% ( -10% - 10%) 0.795
Prefix3 133.26 (7.3%) 132.68 (8.8%) -0.4% ( -15% - 16%) 0.865
HighTermTitleBDVSort 189.02 (17.9%) 188.23 (12.7%) -0.4% ( -26% - 36%) 0.932
MedSpanNear 129.28 (2.6%) 129.09 (3.1%) -0.1% ( -5% - 5%) 0.871
OrNotHighLow 900.87 (3.4%) 900.01 (3.7%) -0.1% ( -6% - 7%) 0.932
LowPhrase 61.05 (2.7%) 61.00 (3.1%) -0.1% ( -5% - 5%) 0.918
HighSpanNear 96.65 (3.2%) 96.63 (3.3%) -0.0% ( -6% - 6%) 0.990
Fuzzy1 67.13 (6.9%) 67.15 (6.6%) 0.0% ( -12% - 14%) 0.988
OrHighNotMed 811.67 (4.9%) 812.18 (5.6%) 0.1% ( -9% - 11%) 0.969
BrowseMonthTaxoFacets 13.21 (2.8%) 13.22 (2.8%) 0.1% ( -5% - 5%) 0.941
HighPhrase 34.18 (3.1%) 34.21 (3.3%) 0.1% ( -6% - 6%) 0.939
AndHighLow 905.10 (4.0%) 905.96 (5.0%) 0.1% ( -8% - 9%) 0.947
MedPhrase 87.90 (2.8%) 88.10 (3.0%) 0.2% ( -5% - 6%) 0.811
BrowseDateTaxoFacets 11.06 (3.9%) 11.09 (3.4%) 0.3% ( -6% - 7%) 0.811
BrowseDayOfYearTaxoFacets 11.05 (3.8%) 11.08 (3.4%) 0.3% ( -6% - 7%) 0.801
MedSloppyPhrase 152.46 (3.1%) 152.89 (2.7%) 0.3% ( -5% - 6%) 0.757
PKLookup 215.89 (2.8%) 216.86 (3.8%) 0.5% ( -5% - 7%) 0.667
TermDTSort 436.33 (15.6%) 438.31 (13.8%) 0.5% ( -25% - 35%) 0.922
LowSpanNear 119.90 (2.4%) 120.46 (2.3%) 0.5% ( -4% - 5%) 0.533
HighSloppyPhrase 28.82 (3.9%) 28.99 (2.8%) 0.6% ( -5% - 7%) 0.586
AndHighMed 475.36 (5.6%) 478.26 (5.8%) 0.6% ( -10% - 12%) 0.735
LowSloppyPhrase 388.99 (3.4%) 392.32 (2.9%) 0.9% ( -5% - 7%) 0.387
OrNotHighMed 774.61 (6.6%) 781.75 (5.6%) 0.9% ( -10% - 14%) 0.633
HighTerm 1268.49 (5.6%) 1290.00 (5.6%) 1.7% ( -9% - 13%) 0.340
HighIntervalsOrdered 417.04 (3.1%) 425.09 (2.9%) 1.9% ( -3% - 8%) 0.043
MedTerm 1583.25 (5.4%) 1627.50 (5.5%) 2.8% ( -7% - 14%) 0.107
OrHighHigh 61.28 (3.6%) 64.46 (3.0%) 5.2% ( -1% - 12%) 0.000
OrHighMed 79.13 (2.9%) 85.68 (3.3%) 8.3% ( 1% - 14%) 0.000
OrHighLow 231.58 (4.7%) 683.73 (16.0%) 195.2% ( 166% - 226%) 0.000
```
```
full wikimedium5m run 2
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrHighHigh 97.84 (2.7%) 78.42 (2.1%) -19.8% ( -24% - -15%) 0.000
HighTermTitleBDVSort 223.86 (17.8%) 217.70 (16.4%) -2.8% ( -31% - 38%) 0.611
OrNotHighLow 964.32 (2.6%) 945.18 (6.0%) -2.0% ( -10% - 6%) 0.175
OrHighNotLow 814.26 (5.8%) 799.46 (5.7%) -1.8% ( -12% - 10%) 0.316
HighTermMonthSort 342.78 (14.3%) 338.52 (15.6%) -1.2% ( -27% - 33%) 0.793
HighTermDayOfYearSort 259.90 (13.7%) 257.22 (13.8%) -1.0% ( -25% - 30%) 0.812
TermDTSort 234.69 (10.9%) 232.30 (12.3%) -1.0% ( -21% - 24%) 0.782
AndHighHigh 93.13 (3.0%) 92.19 (3.5%) -1.0% ( -7% - 5%) 0.326
MedTerm 1410.12 (3.9%) 1398.22 (2.4%) -0.8% ( -6% - 5%) 0.408
OrNotHighHigh 679.95 (6.4%) 674.81 (6.3%) -0.8% ( -12% - 12%) 0.706
OrHighNotMed 744.68 (4.4%) 739.05 (5.8%) -0.8% ( -10% - 9%) 0.644
AndHighMed 451.76 (3.8%) 448.59 (3.4%) -0.7% ( -7% - 6%) 0.540
AndHighLow 969.58 (5.6%) 963.88 (4.8%) -0.6% ( -10% - 10%) 0.720
LowSpanNear 25.23 (4.2%) 25.11 (2.9%) -0.5% ( -7% - 6%) 0.666
MedSpanNear 26.41 (2.4%) 26.33 (1.5%) -0.3% ( -4% - 3%) 0.610
HighIntervalsOrdered 37.09 (1.9%) 36.98 (2.4%) -0.3% ( -4% - 4%) 0.669
OrHighNotHigh 679.06 (4.3%) 677.17 (5.8%) -0.3% ( -9% - 10%) 0.863
HighSpanNear 32.19 (2.2%) 32.14 (2.1%) -0.2% ( -4% - 4%) 0.822
IntNRQ 322.43 (2.0%) 322.04 (2.5%) -0.1% ( -4% - 4%) 0.865
BrowseMonthSSDVFacets 32.22 (1.7%) 32.25 (1.5%) 0.1% ( -3% - 3%) 0.896
LowSloppyPhrase 39.45 (2.6%) 39.48 (2.4%) 0.1% ( -4% - 5%) 0.921
BrowseDayOfYearSSDVFacets 28.20 (5.4%) 28.23 (5.2%) 0.1% ( -9% - 11%) 0.947
HighSloppyPhrase 56.95 (2.4%) 57.03 (2.4%) 0.1% ( -4% - 4%) 0.846
PKLookup 217.45 (3.9%) 217.78 (4.2%) 0.2% ( -7% - 8%) 0.906
LowTerm 1614.00 (3.7%) 1616.52 (4.3%) 0.2% ( -7% - 8%) 0.902
MedSloppyPhrase 335.24 (2.8%) 336.50 (2.7%) 0.4% ( -4% - 6%) 0.665
MedPhrase 257.34 (2.7%) 258.59 (1.9%) 0.5% ( -4% - 5%) 0.515
HighPhrase 100.07 (2.1%) 100.66 (1.7%) 0.6% ( -3% - 4%) 0.332
BrowseDayOfYearTaxoFacets 11.20 (2.8%) 11.28 (2.5%) 0.7% ( -4% - 6%) 0.410
BrowseMonthTaxoFacets 13.07 (2.4%) 13.17 (1.9%) 0.7% ( -3% - 5%) 0.283
BrowseDateTaxoFacets 11.18 (2.9%) 11.27 (2.5%) 0.8% ( -4% - 6%) 0.369
Wildcard 55.50 (4.6%) 56.08 (2.9%) 1.0% ( -6% - 8%) 0.391
LowPhrase 501.30 (3.5%) 506.61 (3.2%) 1.1% ( -5% - 8%) 0.319
Prefix3 107.90 (6.5%) 109.16 (3.9%) 1.2% ( -8% - 12%) 0.491
Respell 73.30 (3.3%) 74.17 (2.6%) 1.2% ( -4% - 7%) 0.210
OrNotHighMed 625.05 (4.3%) 634.75 (4.9%) 1.6% ( -7% - 11%) 0.289
Fuzzy2 67.34 (18.7%) 68.92 (16.8%) 2.3% ( -27% - 46%) 0.677
HighTerm 1559.83 (4.6%) 1608.90 (5.3%) 3.1% ( -6% - 13%) 0.044
Fuzzy1 74.41 (17.1%) 77.02 (13.2%) 3.5% ( -22% - 40%) 0.467
OrHighMed 176.89 (4.0%) 192.17 (2.7%) 8.6% ( 1% - 16%) 0.000
OrHighLow 179.14 (3.0%) 634.97 (16.3%) 254.5% ( 228% - 282%) 0.000
```
```
full wikimedium5m run 3
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy2 78.85 (17.1%) 74.79 (15.3%) -5.1% ( -32% - 32%) 0.315
Fuzzy1 73.72 (12.3%) 70.14 (9.6%) -4.9% ( -23% - 19%) 0.164
OrHighMed 218.87 (3.8%) 213.12 (3.9%) -2.6% ( -9% - 5%) 0.031
OrHighNotHigh 710.58 (5.0%) 693.73 (4.9%) -2.4% ( -11% - 7%) 0.130
OrHighNotLow 766.45 (7.0%) 752.36 (5.4%) -1.8% ( -13% - 11%) 0.351
OrHighNotMed 788.49 (4.6%) 779.76 (4.0%) -1.1% ( -9% - 7%) 0.415
MedSpanNear 432.51 (2.6%) 428.61 (2.9%) -0.9% ( -6% - 4%) 0.301
HighPhrase 328.27 (2.6%) 325.47 (3.1%) -0.9% ( -6% - 4%) 0.338
MedTerm 1537.24 (3.9%) 1525.49 (3.9%) -0.8% ( -8% - 7%) 0.537
PKLookup 224.01 (3.4%) 222.35 (3.2%) -0.7% ( -7% - 6%) 0.478
HighTerm 1852.48 (6.1%) 1839.68 (6.9%) -0.7% ( -12% - 13%) 0.737
OrNotHighLow 872.06 (4.3%) 866.35 (3.3%) -0.7% ( -7% - 7%) 0.589
OrNotHighHigh 696.91 (4.9%) 694.25 (5.3%) -0.4% ( -10% - 10%) 0.814
AndHighMed 399.43 (3.7%) 398.38 (3.4%) -0.3% ( -7% - 7%) 0.818
BrowseMonthTaxoFacets 13.35 (2.5%) 13.33 (2.8%) -0.1% ( -5% - 5%) 0.891
BrowseMonthSSDVFacets 31.99 (2.2%) 31.97 (2.3%) -0.1% ( -4% - 4%) 0.917
HighIntervalsOrdered 56.92 (1.7%) 56.89 (1.5%) -0.1% ( -3% - 3%) 0.916
MedPhrase 421.85 (2.6%) 421.64 (2.4%) -0.1% ( -4% - 5%) 0.949
LowSpanNear 215.84 (1.5%) 215.81 (1.9%) -0.0% ( -3% - 3%) 0.975
BrowseDayOfYearTaxoFacets 11.13 (3.0%) 11.13 (3.2%) -0.0% ( -6% - 6%) 0.992
BrowseDayOfYearSSDVFacets 27.51 (8.3%) 27.52 (8.1%) 0.0% ( -15% - 17%) 0.994
HighSpanNear 16.99 (2.2%) 16.99 (2.1%) 0.0% ( -4% - 4%) 0.968
BrowseDateTaxoFacets 11.11 (3.0%) 11.11 (3.3%) 0.0% ( -6% - 6%) 0.977
Wildcard 259.96 (2.3%) 260.11 (2.7%) 0.1% ( -4% - 5%) 0.943
HighTermTitleBDVSort 216.56 (6.9%) 216.79 (7.9%) 0.1% ( -13% - 15%) 0.964
LowSloppyPhrase 36.16 (3.5%) 36.20 (3.8%) 0.1% ( -6% - 7%) 0.922
LowTerm 1653.62 (6.1%) 1656.23 (4.8%) 0.2% ( -10% - 11%) 0.928
TermDTSort 236.21 (14.9%) 236.69 (14.7%) 0.2% ( -25% - 34%) 0.965
OrNotHighMed 738.85 (3.6%) 741.27 (4.7%) 0.3% ( -7% - 9%) 0.806
IntNRQ 122.68 (1.1%) 123.17 (0.8%) 0.4% ( -1% - 2%) 0.210
Respell 75.86 (2.4%) 76.22 (2.0%) 0.5% ( -3% - 5%) 0.505
HighSloppyPhrase 80.85 (3.7%) 81.25 (4.6%) 0.5% ( -7% - 9%) 0.708
MedSloppyPhrase 31.20 (3.5%) 31.39 (4.3%) 0.6% ( -6% - 8%) 0.628
HighTermMonthSort 396.29 (8.2%) 398.90 (9.3%) 0.7% ( -15% - 19%) 0.812
Prefix3 393.10 (2.7%) 396.20 (2.5%) 0.8% ( -4% - 6%) 0.339
AndHighHigh 105.61 (3.7%) 106.69 (4.0%) 1.0% ( -6% - 9%) 0.399
LowPhrase 61.52 (2.1%) 62.17 (3.2%) 1.1% ( -4% - 6%) 0.221
AndHighLow 915.63 (4.3%) 928.98 (3.1%) 1.5% ( -5% - 9%) 0.217
HighTermDayOfYearSort 216.71 (14.0%) 220.00 (15.9%) 1.5% ( -24% - 36%) 0.749
OrHighLow 535.18 (7.4%) 571.87 (5.8%) 6.9% ( -5% - 21%) 0.001
OrHighHigh 51.30 (2.8%) 56.55 (2.7%) 10.2% ( 4% - 16%) 0.000
```
So far the implementation seems to be similar to the baseline WANDScorer, with the surprising occasional huge speed up or `OrHighLow`. Hopefully this is not caused by a bug :D . I think this performance characteristics makes sense, as the low frequency / high score contribution term would drive the iteration, and a big window size would cause more docs to be pruned quickly if it can't be competitive from their maxScores.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a change in pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #113:
URL: https://github.com/apache/lucene/pull/113#discussion_r629794667
##########
File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch;
+
+import java.io.IOException;
+import java.util.*;
+import org.apache.lucene.util.Bits;
+
+/** BulkScorer that leverages BMM algorithm within interval (min, max) */
+public class BMMBulkScorer extends BulkScorer {
+ private List<Scorer> scorers;
+ private DisiWrapper[] allScorers;
+ private Weight weight;
+ private ScoreMode scoreMode;
+ private int scalingFactor;
+ private long cost;
+ private static final int FIXED_WINDOW_SIZE = 2048;
Review comment:
Hmm I thought we would like to use a window here so that we only need to update maxScore for scorers at larger interval checkpoint (the other implementation has more frequent checks and updates for maxScore, as it takes the min of block boundary of all scorers). But anyway by taking out the window here I assume you would like to have the BMM scorer run directly through BulkScorer? I can give that a try as well!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #113:
URL: https://github.com/apache/lucene/pull/113#issuecomment-837753531
I've also tried out smaller window sizes in the latest 2 commits (benchmark results in the git commit message), and it appears that window size of 1024 might have better performance than 2048 for OrMedMedMedMedMed queries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface
Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #113:
URL: https://github.com/apache/lucene/pull/113#discussion_r629112286
##########
File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch;
+
+import java.io.IOException;
+import java.util.*;
+import org.apache.lucene.util.Bits;
+
+/** BulkScorer that leverages BMM algorithm within interval (min, max) */
+public class BMMBulkScorer extends BulkScorer {
+ private List<Scorer> scorers;
+ private DisiWrapper[] allScorers;
+ private Weight weight;
+ private ScoreMode scoreMode;
+ private int scalingFactor;
+ private long cost;
+ private static final int FIXED_WINDOW_SIZE = 2048;
Review comment:
The reason why BooleanScorer has such a window is to be able to collect hits into a bitset, which we're not doing here. Do the numbers get better if we get rid of this window?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org