You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/04/29 03:32:35 UTC

[GitHub] [lucene] zacharymorn opened a new pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

zacharymorn opened a new pull request #113:
URL: https://github.com/apache/lucene/pull/113


   Implement BMM algorithm from "Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes" by Dimopoulos, Nepomnyachiy and Suel, using BulkScorer interface. 
   
   This BMM implementation passes all existing tests run by `./gradlew check` as well as luceneutil benchmark
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

Posted by GitBox <gi...@apache.org>.

zacharymorn commented on pull request #113:
URL: https://github.com/apache/lucene/pull/113#issuecomment-840293637


   Ran wikibigall for BMM BulkScorer implementation with window following the suggestions from https://github.com/apache/lucene/pull/101#issuecomment-837909869, and got the following results:
   
   wikibigall run 1
   ```
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy2       45.83      (9.9%)       41.93     (19.0%)   -8.5% ( -34% -   22%) 0.076
                     Fuzzy1       56.71      (8.2%)       51.99     (14.4%)   -8.3% ( -28% -   15%) 0.025
            AndHighOrMedMed       43.61      (3.2%)       43.13      (2.7%)   -1.1% (  -6% -    4%) 0.246
      BrowseMonthTaxoFacets        8.47      (6.8%)        8.43      (6.8%)   -0.4% ( -13% -   14%) 0.841
                   Wildcard       59.90      (2.7%)       59.66      (3.0%)   -0.4% (  -5% -    5%) 0.658
                    Respell       49.17      (2.6%)       49.05      (3.5%)   -0.3% (  -6% -    5%) 0.789
                   PKLookup      205.25      (5.3%)      204.79      (4.4%)   -0.2% (  -9% -   10%) 0.883
   BrowseDayOfYearSSDVFacets       17.57      (2.2%)       17.55      (2.3%)   -0.2% (  -4% -    4%) 0.816
           IntervalsOrdered        1.94      (3.8%)        1.94      (3.9%)   -0.1% (  -7% -    7%) 0.913
      BrowseMonthSSDVFacets       19.16      (2.2%)       19.14      (2.4%)   -0.1% (  -4% -    4%) 0.866
   BrowseDayOfYearTaxoFacets        7.32      (6.7%)        7.31      (6.8%)   -0.1% ( -12% -   14%) 0.963
             TermDateFacets        8.56      (6.5%)        8.56      (6.7%)   -0.1% ( -12% -   14%) 0.974
                     Phrase       24.65      (3.1%)       24.64      (3.3%)   -0.1% (  -6% -    6%) 0.948
       BrowseDateTaxoFacets        7.64      (6.9%)        7.63      (7.0%)   -0.0% ( -13% -   14%) 0.985
                 AndHighMed       44.58      (4.1%)       44.59      (3.8%)    0.0% (  -7% -    8%) 0.984
                AndHighHigh       22.83      (4.3%)       22.84      (3.7%)    0.0% (  -7% -    8%) 0.980
           AndMedOrHighHigh       29.32      (3.5%)       29.33      (4.0%)    0.0% (  -7% -    7%) 0.976
                     IntNRQ      122.51     (17.1%)      122.58     (17.2%)    0.1% ( -29% -   41%) 0.992
                   SpanNear        4.82      (3.0%)        4.83      (2.9%)    0.2% (  -5% -    6%) 0.818
                    Prefix3      114.88      (6.0%)      115.25      (6.7%)    0.3% ( -11% -   13%) 0.873
               SloppyPhrase        1.19      (9.8%)        1.20      (9.9%)    0.3% ( -17% -   22%) 0.913
             TermBGroup1M1P       24.69      (4.2%)       24.80      (4.1%)    0.4% (  -7% -    9%) 0.745
                TermGroup1M       15.52      (2.8%)       15.60      (3.3%)    0.5% (  -5% -    6%) 0.587
                       Term     1018.96      (6.7%)     1026.53      (7.1%)    0.7% ( -12% -   15%) 0.734
               TermBGroup1M       25.25      (2.9%)       25.44      (3.5%)    0.8% (  -5% -    7%) 0.454
               VectorSearch     1014.64      (6.7%)     1023.16      (6.8%)    0.8% ( -11% -   15%) 0.695
               TermGroup100       11.41      (4.0%)       11.51      (4.0%)    0.8% (  -6% -    9%) 0.506
              TermMonthSort       63.33     (14.0%)       63.90     (14.0%)    0.9% ( -23% -   33%) 0.838
               TermGroup10K       18.49      (3.0%)       18.74      (3.7%)    1.4% (  -5% -    8%) 0.196
              TermTitleSort      179.96     (13.4%)      182.73     (14.1%)    1.5% ( -22% -   33%) 0.723
                 TermDTSort       57.89     (11.1%)       59.22     (14.1%)    2.3% ( -20% -   30%) 0.569
          TermDayOfYearSort       55.40      (8.0%)       57.06     (11.4%)    3.0% ( -15% -   24%) 0.335
                 OrHighHigh       74.09      (3.6%)       79.35      (4.4%)    7.1% (   0% -   15%) 0.000
                  OrHighMed       45.50      (3.8%)       56.92      (4.1%)   25.1% (  16% -   34%) 0.000
   ```
   
   wikibigall run 2
   ```
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy2       52.37      (9.0%)       47.63     (11.0%)   -9.0% ( -26% -   12%) 0.004
                     Fuzzy1       60.48      (8.9%)       55.26     (10.1%)   -8.6% ( -25% -   11%) 0.004
            AndHighOrMedMed       24.96      (2.1%)       24.60      (3.6%)   -1.4% (  -7% -    4%) 0.121
               VectorSearch      819.06      (4.6%)      812.87      (4.9%)   -0.8% (  -9% -    9%) 0.617
                    Respell       42.97      (3.9%)       42.67      (4.4%)   -0.7% (  -8% -    7%) 0.599
      BrowseMonthSSDVFacets       19.34      (1.3%)       19.22      (2.3%)   -0.6% (  -4% -    2%) 0.290
           AndMedOrHighHigh       25.32      (3.5%)       25.16      (3.8%)   -0.6% (  -7% -    6%) 0.595
                     Phrase       21.60      (3.6%)       21.47      (3.8%)   -0.6% (  -7% -    7%) 0.603
           IntervalsOrdered        0.83      (5.4%)        0.82      (5.0%)   -0.5% ( -10% -   10%) 0.760
               TermGroup100       23.86      (4.3%)       23.75      (4.4%)   -0.5% (  -8% -    8%) 0.724
             TermDateFacets       10.44     (10.0%)       10.41      (9.4%)   -0.4% ( -17% -   21%) 0.903
                   PKLookup      210.67      (5.9%)      209.94      (4.9%)   -0.3% ( -10% -   11%) 0.841
                TermGroup1M       10.60      (4.4%)       10.56      (4.7%)   -0.3% (  -9% -    9%) 0.811
          TermDayOfYearSort       47.25     (10.1%)       47.16     (10.3%)   -0.2% ( -18% -   22%) 0.952
             TermBGroup1M1P       15.66      (5.3%)       15.63      (5.1%)   -0.2% ( -10% -   10%) 0.920
      BrowseMonthTaxoFacets        8.22      (7.9%)        8.20      (7.7%)   -0.2% ( -14% -   16%) 0.947
                     IntNRQ      271.37      (2.8%)      271.06      (3.1%)   -0.1% (  -5% -    5%) 0.904
                   SpanNear       10.34      (2.1%)       10.33      (2.1%)   -0.1% (  -4% -    4%) 0.895
               TermGroup10K       10.47      (4.3%)       10.47      (4.2%)   -0.0% (  -8% -    8%) 0.975
   BrowseDayOfYearTaxoFacets        7.10      (7.7%)        7.09      (7.4%)   -0.0% ( -14% -   16%) 0.986
       BrowseDateTaxoFacets        7.41      (8.1%)        7.40      (7.7%)   -0.0% ( -14% -   17%) 0.989
               SloppyPhrase        5.58      (8.3%)        5.58      (7.7%)   -0.0% ( -14% -   17%) 0.991
               TermBGroup1M       25.52      (4.3%)       25.56      (4.5%)    0.1% (  -8% -    9%) 0.921
                 AndHighMed       44.59      (2.3%)       44.68      (3.2%)    0.2% (  -5% -    5%) 0.812
                   Wildcard       88.60      (3.9%)       88.83      (3.4%)    0.3% (  -6% -    7%) 0.818
                    Prefix3       48.71      (5.2%)       48.91      (4.8%)    0.4% (  -9% -   10%) 0.792
   BrowseDayOfYearSSDVFacets       17.53      (2.0%)       17.63      (3.4%)    0.6% (  -4% -    6%) 0.490
                AndHighHigh       31.50      (3.5%)       31.80      (4.4%)    1.0% (  -6% -    9%) 0.450
                 TermDTSort      163.80     (14.4%)      165.54     (14.3%)    1.1% ( -24% -   34%) 0.814
              TermTitleSort      113.02      (9.9%)      114.44     (13.0%)    1.3% ( -19% -   26%) 0.730
              TermMonthSort       63.15     (10.4%)       64.13     (13.3%)    1.5% ( -20% -   28%) 0.683
                 OrHighHigh       17.12      (2.5%)       17.65      (3.1%)    3.1% (  -2% -    8%) 0.000
                       Term     1038.79      (8.0%)     1077.30      (7.9%)    3.7% ( -11% -   21%) 0.142
                  OrHighMed       45.90      (2.4%)       56.69      (3.7%)   23.5% (  16% -   30%) 0.000
   ```
   wikibigall run 3
   ```
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy1       59.46      (9.9%)       53.54     (12.3%)  -10.0% ( -29% -   13%) 0.005
                     Fuzzy2       61.37     (11.5%)       60.03     (13.4%)   -2.2% ( -24% -   25%) 0.580
            AndHighOrMedMed       33.91      (3.7%)       33.59      (3.6%)   -1.0% (  -7% -    6%) 0.405
                    Prefix3       29.87     (12.4%)       29.76     (12.4%)   -0.4% ( -22% -   27%) 0.924
           AndMedOrHighHigh       28.14      (3.7%)       28.05      (4.2%)   -0.3% (  -7% -    7%) 0.817
                     IntNRQ      272.56      (3.9%)      271.78      (3.1%)   -0.3% (  -7% -    7%) 0.800
                   SpanNear        1.21     (16.7%)        1.21     (16.6%)   -0.2% ( -28% -   39%) 0.966
           IntervalsOrdered        2.03      (4.5%)        2.03      (4.9%)   -0.2% (  -9% -    9%) 0.895
                     Phrase       55.81      (5.4%)       55.72      (4.9%)   -0.2% (  -9% -   10%) 0.920
   BrowseDayOfYearSSDVFacets       17.45      (1.8%)       17.43      (1.7%)   -0.1% (  -3% -    3%) 0.821
                   Wildcard       37.74      (9.2%)       37.71      (9.6%)   -0.1% ( -17% -   20%) 0.983
      BrowseMonthSSDVFacets       19.28      (1.3%)       19.28      (1.3%)    0.0% (  -2% -    2%) 0.997
                    Respell       40.21      (3.9%)       40.25      (5.0%)    0.1% (  -8% -    9%) 0.940
               TermBGroup1M       25.59      (4.1%)       25.65      (3.9%)    0.3% (  -7% -    8%) 0.837
   BrowseDayOfYearTaxoFacets        7.06      (6.3%)        7.09      (6.2%)    0.5% ( -11% -   13%) 0.780
      BrowseMonthTaxoFacets        8.14      (6.9%)        8.19      (6.7%)    0.6% ( -12% -   15%) 0.796
       BrowseDateTaxoFacets        7.36      (6.5%)        7.40      (6.5%)    0.6% ( -11% -   14%) 0.785
               SloppyPhrase        5.39      (4.5%)        5.42      (4.7%)    0.6% (  -8% -   10%) 0.693
                 TermDTSort       80.69      (8.2%)       81.24      (8.2%)    0.7% ( -14% -   18%) 0.794
                AndHighHigh       18.01      (4.4%)       18.15      (4.9%)    0.8% (  -8% -   10%) 0.601
                 AndHighMed       56.72      (4.7%)       57.19      (5.2%)    0.8% (  -8% -   11%) 0.598
             TermDateFacets        8.23      (8.0%)        8.30      (8.2%)    0.8% ( -14% -   18%) 0.742
              TermMonthSort       94.14     (13.2%)       94.94     (15.8%)    0.9% ( -24% -   34%) 0.852
                TermGroup1M       22.76      (3.7%)       22.96      (3.8%)    0.9% (  -6% -    8%) 0.460
               TermGroup10K       10.46      (4.9%)       10.60      (5.3%)    1.3% (  -8% -   12%) 0.422
              TermTitleSort      117.75     (13.5%)      119.36     (16.3%)    1.4% ( -25% -   35%) 0.772
                       Term     1079.43      (8.3%)     1094.78      (7.3%)    1.4% ( -13% -   18%) 0.565
          TermDayOfYearSort       45.89      (8.1%)       46.58      (8.7%)    1.5% ( -14% -   19%) 0.572
                   PKLookup      207.04      (5.6%)      210.17      (5.3%)    1.5% (  -8% -   13%) 0.381
               TermGroup100       13.92      (5.2%)       14.13      (5.6%)    1.5% (  -8% -   13%) 0.372
               VectorSearch     1046.61      (6.7%)     1067.51      (6.9%)    2.0% ( -10% -   16%) 0.351
             TermBGroup1M1P       18.24      (6.3%)       18.61      (6.5%)    2.0% ( -10% -   15%) 0.319
                 OrHighHigh       74.99      (5.3%)       81.16      (5.5%)    8.2% (  -2% -   20%) 0.000
                  OrHighMed       45.89      (3.2%)       57.40      (5.6%)   25.1% (  15% -   34%) 0.000
   ```
   
   wikibigall run 4
   ```
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy1       57.19      (8.5%)       52.98      (6.5%)   -7.4% ( -20% -    8%) 0.002
                     Fuzzy2       45.44     (11.4%)       42.42     (11.4%)   -6.6% ( -26% -   18%) 0.066
            AndHighOrMedMed       31.75      (3.8%)       31.40      (3.7%)   -1.1% (  -8% -    6%) 0.365
           AndMedOrHighHigh       25.00      (3.7%)       24.77      (3.5%)   -0.9% (  -7% -    6%) 0.413
               SloppyPhrase        5.40      (5.3%)        5.36      (5.4%)   -0.7% ( -10% -   10%) 0.674
               VectorSearch      807.06      (5.0%)      802.68      (5.6%)   -0.5% ( -10% -   10%) 0.747
                    Prefix3      184.66      (4.6%)      183.98      (4.9%)   -0.4% (  -9% -    9%) 0.807
                AndHighHigh       65.28      (3.3%)       65.10      (4.0%)   -0.3% (  -7% -    7%) 0.815
               TermGroup10K       15.61      (3.7%)       15.58      (4.0%)   -0.2% (  -7% -    7%) 0.874
                TermGroup1M       10.44      (3.9%)       10.42      (4.8%)   -0.2% (  -8% -    8%) 0.898
               TermBGroup1M       21.55      (4.2%)       21.52      (5.1%)   -0.1% (  -9% -    9%) 0.919
             TermDateFacets       10.89      (9.2%)       10.87      (9.7%)   -0.1% ( -17% -   20%) 0.972
           IntervalsOrdered        0.81      (4.6%)        0.81      (4.5%)    0.0% (  -8% -    9%) 0.994
      BrowseMonthSSDVFacets       18.99      (1.5%)       18.99      (1.2%)    0.0% (  -2% -    2%) 0.973
       BrowseDateTaxoFacets        7.18      (7.2%)        7.18      (7.2%)    0.0% ( -13% -   15%) 0.985
                   SpanNear        2.08      (3.9%)        2.08      (3.7%)    0.1% (  -7% -    8%) 0.962
               TermGroup100       13.72      (4.2%)       13.73      (5.0%)    0.1% (  -8% -    9%) 0.964
      BrowseMonthTaxoFacets        8.00      (7.5%)        8.01      (7.7%)    0.1% ( -14% -   16%) 0.971
   BrowseDayOfYearSSDVFacets       17.14      (2.5%)       17.16      (2.7%)    0.1% (  -4% -    5%) 0.910
                 TermDTSort       45.05      (8.1%)       45.10      (7.5%)    0.1% ( -14% -   17%) 0.964
              TermTitleSort       88.10     (12.0%)       88.20     (11.2%)    0.1% ( -20% -   26%) 0.973
   BrowseDayOfYearTaxoFacets        6.87      (6.7%)        6.88      (6.9%)    0.1% ( -12% -   14%) 0.949
                 AndHighMed       55.06      (3.4%)       55.15      (3.1%)    0.2% (  -6% -    6%) 0.883
              TermMonthSort       61.57     (12.2%)       61.72     (11.3%)    0.3% ( -20% -   26%) 0.946
                     Phrase       53.70      (4.9%)       53.84      (4.5%)    0.3% (  -8% -   10%) 0.863
                     IntNRQ      263.07      (5.8%)      263.89      (4.7%)    0.3% (  -9% -   11%) 0.851
             TermBGroup1M1P       30.16      (4.3%)       30.34      (4.8%)    0.6% (  -8% -   10%) 0.691
                    Respell       46.63      (5.3%)       46.91      (5.0%)    0.6% (  -9% -   11%) 0.712
                   Wildcard       70.44      (4.3%)       70.89      (3.9%)    0.6% (  -7% -    9%) 0.627
                       Term      950.32      (7.6%)      957.36      (7.9%)    0.7% ( -13% -   17%) 0.763
          TermDayOfYearSort       46.36     (14.2%)       46.92     (15.2%)    1.2% ( -24% -   35%) 0.797
                   PKLookup      200.63      (7.4%)      203.46      (6.6%)    1.4% ( -11% -   16%) 0.524
                 OrHighHigh       10.27      (2.7%)       10.74      (3.0%)    4.5% (  -1% -   10%) 0.000
                  OrHighMed       43.96      (3.1%)       54.71      (4.3%)   24.4% (  16% -   32%) 0.000
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

Posted by GitBox <gi...@apache.org>.

zacharymorn commented on a change in pull request #113:
URL: https://github.com/apache/lucene/pull/113#discussion_r631568912



##########
File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch;
+
+import java.io.IOException;
+import java.util.*;
+import org.apache.lucene.util.Bits;
+
+/** BulkScorer that leverages BMM algorithm within interval (min, max) */
+public class BMMBulkScorer extends BulkScorer {
+  private List<Scorer> scorers;
+  private DisiWrapper[] allScorers;
+  private Weight weight;
+  private ScoreMode scoreMode;
+  private int scalingFactor;
+  private long cost;
+  private static final int FIXED_WINDOW_SIZE = 2048;

Review comment:
       I also ran wikibigall for the above changes as well following the suggestions from https://github.com/apache/lucene/pull/101#issuecomment-837909869, and got the following results:
   
   wikibigall run 1
   ```
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy1       51.08      (9.7%)       44.15     (11.5%)  -13.6% ( -31% -    8%) 0.000
                     Fuzzy2       51.90     (12.0%)       48.39     (10.3%)   -6.8% ( -25% -   17%) 0.056
          TermDayOfYearSort      160.58     (12.1%)      156.99     (11.2%)   -2.2% ( -22% -   23%) 0.542
              TermMonthSort       58.61      (8.4%)       57.57      (8.0%)   -1.8% ( -16% -   15%) 0.494
                 TermDTSort      104.08     (11.4%)      102.37      (9.4%)   -1.6% ( -20% -   21%) 0.619
              TermTitleSort      104.99      (8.3%)      103.29      (7.6%)   -1.6% ( -16% -   15%) 0.519
            AndHighOrMedMed       33.52      (3.1%)       33.02      (2.8%)   -1.5% (  -7% -    4%) 0.114
                AndHighHigh       18.08      (4.6%)       17.87      (4.1%)   -1.1% (  -9% -    7%) 0.406
               TermBGroup1M       14.14      (4.1%)       14.03      (3.5%)   -0.8% (  -8% -    7%) 0.491
             TermDateFacets        7.58      (5.5%)        7.53      (6.1%)   -0.7% ( -11% -   11%) 0.714
                     Phrase       10.38      (1.8%)       10.32      (2.2%)   -0.6% (  -4% -    3%) 0.359
                 AndHighMed       82.33      (4.0%)       81.87      (3.9%)   -0.6% (  -8% -    7%) 0.655
               SloppyPhrase        2.32      (8.3%)        2.31      (9.9%)   -0.5% ( -17% -   19%) 0.855
               TermGroup100       34.52      (3.7%)       34.36      (3.0%)   -0.5% (  -6% -    6%) 0.650
             TermBGroup1M1P       43.50      (3.8%)       43.30      (4.0%)   -0.5% (  -7% -    7%) 0.700
           AndMedOrHighHigh       25.62      (3.4%)       25.51      (3.1%)   -0.4% (  -6% -    6%) 0.666
                TermGroup1M       15.43      (3.0%)       15.37      (2.7%)   -0.4% (  -5% -    5%) 0.668
               VectorSearch      823.98      (1.9%)      820.96      (2.6%)   -0.4% (  -4% -    4%) 0.616
                   PKLookup      210.69      (2.6%)      210.23      (2.5%)   -0.2% (  -5% -    4%) 0.782
      BrowseMonthSSDVFacets       18.90      (0.8%)       18.87      (0.9%)   -0.2% (  -1% -    1%) 0.574
   BrowseDayOfYearTaxoFacets        7.14      (5.3%)        7.14      (5.8%)   -0.1% ( -10% -   11%) 0.943
                   Wildcard       38.83      (2.4%)       38.79      (2.5%)   -0.1% (  -4% -    4%) 0.881
               TermGroup10K       18.47      (2.9%)       18.45      (2.5%)   -0.1% (  -5% -    5%) 0.910
                   SpanNear        4.76      (1.4%)        4.76      (1.3%)   -0.1% (  -2% -    2%) 0.885
                    Prefix3      173.23      (6.8%)      173.13      (6.8%)   -0.1% ( -12% -   14%) 0.978
       BrowseDateTaxoFacets        7.46      (5.5%)        7.46      (6.1%)   -0.1% ( -10% -   12%) 0.976
      BrowseMonthTaxoFacets        8.27      (5.7%)        8.27      (6.4%)   -0.0% ( -11% -   12%) 0.986
                    Respell       41.11      (2.8%)       41.12      (2.6%)    0.0% (  -5% -    5%) 0.972
   BrowseDayOfYearSSDVFacets       17.13      (1.8%)       17.14      (1.7%)    0.1% (  -3% -    3%) 0.887
                     IntNRQ      267.98      (2.2%)      268.69      (2.5%)    0.3% (  -4% -    5%) 0.721
           IntervalsOrdered        3.79      (2.1%)        3.81      (2.4%)    0.5% (  -3% -    5%) 0.448
                       Term     1046.89      (7.5%)     1067.03      (7.0%)    1.9% ( -11% -   17%) 0.401
                  OrHighMed       34.43      (3.2%)       37.66      (5.5%)    9.4% (   0% -   18%) 0.000
                 OrHighHigh       16.93      (3.7%)       25.19      (4.6%)   48.8% (  39% -   59%) 0.000
   ```
   
   wikibigall run 2
   ```
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy1       50.87      (9.8%)       47.08     (13.8%)   -7.5% ( -28% -   17%) 0.049
                     Fuzzy2       30.31      (5.6%)       28.38      (9.6%)   -6.4% ( -20% -    9%) 0.011
              TermMonthSort       59.51     (12.2%)       58.67      (9.4%)   -1.4% ( -20% -   23%) 0.683
                 TermDTSort      172.78     (11.5%)      170.44     (10.1%)   -1.4% ( -20% -   22%) 0.692
           AndMedOrHighHigh        9.65      (3.1%)        9.55      (2.8%)   -1.1% (  -6% -    4%) 0.233
              TermTitleSort       59.25     (12.2%)       58.60      (9.8%)   -1.1% ( -20% -   23%) 0.754
             TermDateFacets        8.18      (7.5%)        8.13      (7.7%)   -0.6% ( -14% -   15%) 0.789
                    Respell       46.60      (3.8%)       46.33      (3.6%)   -0.6% (  -7% -    7%) 0.628
           IntervalsOrdered        3.81      (2.7%)        3.80      (2.8%)   -0.4% (  -5% -    5%) 0.674
            AndHighOrMedMed       24.00      (3.0%)       23.94      (3.4%)   -0.3% (  -6% -    6%) 0.792
                 AndHighMed       59.47      (3.1%)       59.34      (3.6%)   -0.2% (  -6% -    6%) 0.837
                TermGroup1M       22.27      (3.4%)       22.23      (3.5%)   -0.1% (  -6% -    7%) 0.895
       BrowseDateTaxoFacets        7.28      (7.7%)        7.27      (7.8%)   -0.1% ( -14% -   16%) 0.956
   BrowseDayOfYearTaxoFacets        6.97      (7.5%)        6.96      (7.5%)   -0.1% ( -14% -   16%) 0.958
      BrowseMonthTaxoFacets        8.08      (7.7%)        8.07      (7.9%)   -0.1% ( -14% -   16%) 0.962
                AndHighHigh       64.73      (2.8%)       64.67      (3.7%)   -0.1% (  -6% -    6%) 0.921
                   Wildcard       70.06      (3.1%)       70.00      (3.2%)   -0.1% (  -6% -    6%) 0.924
      BrowseMonthSSDVFacets       18.76      (0.9%)       18.77      (0.9%)    0.0% (  -1% -    1%) 0.919
                     Phrase       20.88      (3.8%)       20.90      (3.2%)    0.1% (  -6% -    7%) 0.936
               TermGroup10K       12.15      (3.7%)       12.16      (4.0%)    0.1% (  -7% -    8%) 0.931
             TermBGroup1M1P       15.29      (5.1%)       15.31      (4.6%)    0.1% (  -9% -   10%) 0.936
                    Prefix3       32.94      (2.9%)       32.99      (2.9%)    0.1% (  -5% -    6%) 0.872
   BrowseDayOfYearSSDVFacets       17.10      (1.7%)       17.13      (1.7%)    0.2% (  -3% -    3%) 0.768
               TermGroup100       34.25      (3.8%)       34.34      (3.9%)    0.3% (  -7% -    8%) 0.829
               SloppyPhrase        2.82      (7.5%)        2.83      (7.4%)    0.3% ( -13% -   16%) 0.900
          TermDayOfYearSort       45.78     (11.8%)       45.93     (10.6%)    0.3% ( -19% -   25%) 0.926
                   SpanNear       10.00      (1.2%)       10.05      (1.2%)    0.4% (  -1% -    2%) 0.253
                     IntNRQ      108.69     (24.1%)      109.25     (23.7%)    0.5% ( -38% -   63%) 0.945
               TermBGroup1M       11.95      (4.5%)       12.03      (5.2%)    0.7% (  -8% -   10%) 0.661
                   PKLookup      201.05      (6.0%)      203.48      (4.0%)    1.2% (  -8% -   11%) 0.451
                       Term      667.45      (5.8%)      683.87      (7.3%)    2.5% ( -10% -   16%) 0.240
               VectorSearch      989.57      (5.4%)     1021.23      (5.0%)    3.2% (  -6% -   14%) 0.051
                  OrHighMed       58.35      (3.9%)       69.23      (5.8%)   18.6% (   8% -   29%) 0.000
                 OrHighHigh       11.04      (3.4%)       16.84      (6.2%)   52.5% (  41% -   64%) 0.000
   ```
   wikibigall run 3
   ```
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy1       56.20     (11.1%)       49.60     (12.0%)  -11.7% ( -31% -   12%) 0.001
              TermMonthSort       61.43     (11.3%)       57.85     (14.1%)   -5.8% ( -28% -   21%) 0.148
              TermTitleSort      109.97     (11.2%)      103.85     (14.1%)   -5.6% ( -27% -   22%) 0.167
                 TermDTSort      160.77     (10.8%)      151.92     (13.5%)   -5.5% ( -26% -   21%) 0.156
          TermDayOfYearSort       55.50      (7.1%)       52.92     (15.5%)   -4.6% ( -25% -   19%) 0.222
               TermGroup10K       10.30      (4.6%)       10.02      (7.4%)   -2.7% ( -14% -    9%) 0.160
                       Term     1037.48      (5.2%)     1010.63      (7.6%)   -2.6% ( -14% -   10%) 0.210
               TermBGroup1M       21.54      (5.0%)       21.00      (7.4%)   -2.5% ( -14% -   10%) 0.212
               TermGroup100       18.89      (4.4%)       18.46      (7.8%)   -2.3% ( -13% -   10%) 0.255
             TermDateFacets       10.29      (9.2%)       10.11      (9.5%)   -1.8% ( -18% -   18%) 0.536
             TermBGroup1M1P       43.52      (4.9%)       42.88      (5.6%)   -1.5% ( -11% -    9%) 0.373
                     Fuzzy2       56.25     (13.4%)       55.53     (12.5%)   -1.3% ( -24% -   28%) 0.754
                TermGroup1M       22.31      (3.8%)       22.04      (5.2%)   -1.2% (  -9% -    8%) 0.389
           AndMedOrHighHigh       28.60      (2.5%)       28.31      (2.7%)   -1.0% (  -6% -    4%) 0.222
                     Phrase       59.81      (2.9%)       59.43      (3.1%)   -0.6% (  -6% -    5%) 0.498
                   PKLookup      205.40      (3.8%)      204.10      (4.9%)   -0.6% (  -8% -    8%) 0.648
               VectorSearch     1033.68      (4.0%)     1027.88      (4.3%)   -0.6% (  -8% -    8%) 0.670
       BrowseDateTaxoFacets        7.27      (6.9%)        7.24      (7.0%)   -0.4% ( -13% -   14%) 0.859
   BrowseDayOfYearTaxoFacets        6.97      (6.6%)        6.94      (6.8%)   -0.4% ( -12% -   13%) 0.854
               SloppyPhrase       18.29      (2.0%)       18.22      (2.8%)   -0.4% (  -5% -    4%) 0.612
      BrowseMonthTaxoFacets        8.05      (6.9%)        8.02      (7.0%)   -0.3% ( -13% -   14%) 0.891
            AndHighOrMedMed       23.88      (2.7%)       23.83      (2.3%)   -0.2% (  -5% -    4%) 0.774
           IntervalsOrdered        3.83      (2.5%)        3.83      (2.6%)   -0.1% (  -5% -    5%) 0.862
                     IntNRQ      123.08     (14.8%)      122.93     (15.0%)   -0.1% ( -26% -   34%) 0.979
                   Wildcard       58.03      (2.7%)       57.97      (3.1%)   -0.1% (  -5% -    5%) 0.901
   BrowseDayOfYearSSDVFacets       16.93      (1.7%)       16.91      (1.5%)   -0.1% (  -3% -    3%) 0.851
                    Prefix3      165.67     (10.5%)      165.54      (9.6%)   -0.1% ( -18% -   22%) 0.980
                   SpanNear        4.76      (1.3%)        4.77      (1.0%)    0.0% (  -2% -    2%) 0.915
      BrowseMonthSSDVFacets       18.78      (1.4%)       18.80      (1.3%)    0.1% (  -2% -    2%) 0.815
                    Respell       47.08      (4.1%)       47.19      (4.1%)    0.2% (  -7% -    8%) 0.851
                AndHighHigh       17.36      (3.4%)       17.50      (3.1%)    0.8% (  -5% -    7%) 0.435
                 AndHighMed       32.21      (3.6%)       32.50      (3.2%)    0.9% (  -5% -    7%) 0.406
                  OrHighMed       33.59      (3.2%)       37.09      (3.8%)   10.4% (   3% -   18%) 0.000
                 OrHighHigh       10.82      (3.7%)       17.08      (4.1%)   57.8% (  48% -   68%) 0.000
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

Posted by GitBox <gi...@apache.org>.

zacharymorn commented on a change in pull request #113:
URL: https://github.com/apache/lucene/pull/113#discussion_r629825600



##########
File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch;
+
+import java.io.IOException;
+import java.util.*;
+import org.apache.lucene.util.Bits;
+
+/** BulkScorer that leverages BMM algorithm within interval (min, max) */
+public class BMMBulkScorer extends BulkScorer {
+  private List<Scorer> scorers;
+  private DisiWrapper[] allScorers;
+  private Weight weight;
+  private ScoreMode scoreMode;
+  private int scalingFactor;
+  private long cost;
+  private static final int FIXED_WINDOW_SIZE = 2048;

Review comment:
       I've pushed this change here https://github.com/zacharymorn/lucene/commit/3bcdbb31a7d55b00cb53e4be40a4adc93b9f30db and the corresponding benchmark results are available in the git commit message




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

Posted by GitBox <gi...@apache.org>.

zacharymorn commented on pull request #113:
URL: https://github.com/apache/lucene/pull/113#issuecomment-836122884


   Hi @jpountz, I've ported your changes to this BulkScorer implementation as well, and run both 5 OrMed as well as full wikimedium5m benchmark:
   
   ```
   OrMedMedMedMedMed run 1
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrMedMedMedMedMed       40.90      (8.5%)       39.37      (6.8%)   -3.7% ( -17% -   12%) 0.126
                   PKLookup      228.21      (1.9%)      223.87      (2.2%)   -1.9% (  -5% -    2%) 0.004
   ```
   ```
   OrMedMedMedMedMed run 2
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          OrMedMedMedMedMed       39.72      (5.0%)       38.01      (7.4%)   -4.3% ( -15% -    8%) 0.030
                   PKLookup      226.45      (2.1%)      223.28      (2.3%)   -1.4% (  -5% -    3%) 0.048
   ```
   ```
   OrMedMedMedMedMed run 3
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   PKLookup      226.41      (3.3%)      222.43      (2.3%)   -1.8% (  -7% -    3%) 0.052
          OrMedMedMedMedMed       38.83      (6.7%)       39.27      (7.1%)    1.1% ( -11% -   15%) 0.600
   ```
   ```
   full wikimedium5m run 1
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   Wildcard      376.63      (5.8%)      360.47      (6.2%)   -4.3% ( -15% -    8%) 0.024
              OrNotHighHigh      745.74      (4.5%)      730.51      (5.7%)   -2.0% ( -11% -    8%) 0.208
                     Fuzzy2       40.89      (6.0%)       40.20      (8.5%)   -1.7% ( -15% -   13%) 0.465
      HighTermDayOfYearSort      354.09     (16.6%)      348.53     (13.2%)   -1.6% ( -26% -   33%) 0.740
      BrowseMonthSSDVFacets       31.93      (3.0%)       31.50      (6.5%)   -1.3% ( -10% -    8%) 0.402
                    LowTerm     1978.09      (5.1%)     1956.82      (5.3%)   -1.1% ( -10% -    9%) 0.514
                     IntNRQ      194.54      (3.6%)      193.05      (4.2%)   -0.8% (  -8% -    7%) 0.537
          HighTermMonthSort      330.71     (10.6%)      328.18      (9.7%)   -0.8% ( -19% -   21%) 0.812
               OrHighNotLow      806.97      (6.4%)      801.14      (5.6%)   -0.7% ( -11% -   11%) 0.702
   BrowseDayOfYearSSDVFacets       28.57      (1.7%)       28.39      (2.0%)   -0.6% (  -4% -    3%) 0.294
                AndHighHigh       70.54      (3.8%)       70.12      (4.6%)   -0.6% (  -8% -    8%) 0.657
                    Respell       78.30      (2.0%)       77.93      (2.1%)   -0.5% (  -4% -    3%) 0.463
              OrHighNotHigh      772.33      (5.0%)      768.86      (5.8%)   -0.4% ( -10% -   10%) 0.795
                    Prefix3      133.26      (7.3%)      132.68      (8.8%)   -0.4% ( -15% -   16%) 0.865
       HighTermTitleBDVSort      189.02     (17.9%)      188.23     (12.7%)   -0.4% ( -26% -   36%) 0.932
                MedSpanNear      129.28      (2.6%)      129.09      (3.1%)   -0.1% (  -5% -    5%) 0.871
               OrNotHighLow      900.87      (3.4%)      900.01      (3.7%)   -0.1% (  -6% -    7%) 0.932
                  LowPhrase       61.05      (2.7%)       61.00      (3.1%)   -0.1% (  -5% -    5%) 0.918
               HighSpanNear       96.65      (3.2%)       96.63      (3.3%)   -0.0% (  -6% -    6%) 0.990
                     Fuzzy1       67.13      (6.9%)       67.15      (6.6%)    0.0% ( -12% -   14%) 0.988
               OrHighNotMed      811.67      (4.9%)      812.18      (5.6%)    0.1% (  -9% -   11%) 0.969
      BrowseMonthTaxoFacets       13.21      (2.8%)       13.22      (2.8%)    0.1% (  -5% -    5%) 0.941
                 HighPhrase       34.18      (3.1%)       34.21      (3.3%)    0.1% (  -6% -    6%) 0.939
                 AndHighLow      905.10      (4.0%)      905.96      (5.0%)    0.1% (  -8% -    9%) 0.947
                  MedPhrase       87.90      (2.8%)       88.10      (3.0%)    0.2% (  -5% -    6%) 0.811
       BrowseDateTaxoFacets       11.06      (3.9%)       11.09      (3.4%)    0.3% (  -6% -    7%) 0.811
   BrowseDayOfYearTaxoFacets       11.05      (3.8%)       11.08      (3.4%)    0.3% (  -6% -    7%) 0.801
            MedSloppyPhrase      152.46      (3.1%)      152.89      (2.7%)    0.3% (  -5% -    6%) 0.757
                   PKLookup      215.89      (2.8%)      216.86      (3.8%)    0.5% (  -5% -    7%) 0.667
                 TermDTSort      436.33     (15.6%)      438.31     (13.8%)    0.5% ( -25% -   35%) 0.922
                LowSpanNear      119.90      (2.4%)      120.46      (2.3%)    0.5% (  -4% -    5%) 0.533
           HighSloppyPhrase       28.82      (3.9%)       28.99      (2.8%)    0.6% (  -5% -    7%) 0.586
                 AndHighMed      475.36      (5.6%)      478.26      (5.8%)    0.6% ( -10% -   12%) 0.735
            LowSloppyPhrase      388.99      (3.4%)      392.32      (2.9%)    0.9% (  -5% -    7%) 0.387
               OrNotHighMed      774.61      (6.6%)      781.75      (5.6%)    0.9% ( -10% -   14%) 0.633
                   HighTerm     1268.49      (5.6%)     1290.00      (5.6%)    1.7% (  -9% -   13%) 0.340
       HighIntervalsOrdered      417.04      (3.1%)      425.09      (2.9%)    1.9% (  -3% -    8%) 0.043
                    MedTerm     1583.25      (5.4%)     1627.50      (5.5%)    2.8% (  -7% -   14%) 0.107
                 OrHighHigh       61.28      (3.6%)       64.46      (3.0%)    5.2% (  -1% -   12%) 0.000
                  OrHighMed       79.13      (2.9%)       85.68      (3.3%)    8.3% (   1% -   14%) 0.000
                  OrHighLow      231.58      (4.7%)      683.73     (16.0%)  195.2% ( 166% -  226%) 0.000
   ```
   ```
   full wikimedium5m run 2
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 OrHighHigh       97.84      (2.7%)       78.42      (2.1%)  -19.8% ( -24% -  -15%) 0.000
       HighTermTitleBDVSort      223.86     (17.8%)      217.70     (16.4%)   -2.8% ( -31% -   38%) 0.611
               OrNotHighLow      964.32      (2.6%)      945.18      (6.0%)   -2.0% ( -10% -    6%) 0.175
               OrHighNotLow      814.26      (5.8%)      799.46      (5.7%)   -1.8% ( -12% -   10%) 0.316
          HighTermMonthSort      342.78     (14.3%)      338.52     (15.6%)   -1.2% ( -27% -   33%) 0.793
      HighTermDayOfYearSort      259.90     (13.7%)      257.22     (13.8%)   -1.0% ( -25% -   30%) 0.812
                 TermDTSort      234.69     (10.9%)      232.30     (12.3%)   -1.0% ( -21% -   24%) 0.782
                AndHighHigh       93.13      (3.0%)       92.19      (3.5%)   -1.0% (  -7% -    5%) 0.326
                    MedTerm     1410.12      (3.9%)     1398.22      (2.4%)   -0.8% (  -6% -    5%) 0.408
              OrNotHighHigh      679.95      (6.4%)      674.81      (6.3%)   -0.8% ( -12% -   12%) 0.706
               OrHighNotMed      744.68      (4.4%)      739.05      (5.8%)   -0.8% ( -10% -    9%) 0.644
                 AndHighMed      451.76      (3.8%)      448.59      (3.4%)   -0.7% (  -7% -    6%) 0.540
                 AndHighLow      969.58      (5.6%)      963.88      (4.8%)   -0.6% ( -10% -   10%) 0.720
                LowSpanNear       25.23      (4.2%)       25.11      (2.9%)   -0.5% (  -7% -    6%) 0.666
                MedSpanNear       26.41      (2.4%)       26.33      (1.5%)   -0.3% (  -4% -    3%) 0.610
       HighIntervalsOrdered       37.09      (1.9%)       36.98      (2.4%)   -0.3% (  -4% -    4%) 0.669
              OrHighNotHigh      679.06      (4.3%)      677.17      (5.8%)   -0.3% (  -9% -   10%) 0.863
               HighSpanNear       32.19      (2.2%)       32.14      (2.1%)   -0.2% (  -4% -    4%) 0.822
                     IntNRQ      322.43      (2.0%)      322.04      (2.5%)   -0.1% (  -4% -    4%) 0.865
      BrowseMonthSSDVFacets       32.22      (1.7%)       32.25      (1.5%)    0.1% (  -3% -    3%) 0.896
            LowSloppyPhrase       39.45      (2.6%)       39.48      (2.4%)    0.1% (  -4% -    5%) 0.921
   BrowseDayOfYearSSDVFacets       28.20      (5.4%)       28.23      (5.2%)    0.1% (  -9% -   11%) 0.947
           HighSloppyPhrase       56.95      (2.4%)       57.03      (2.4%)    0.1% (  -4% -    4%) 0.846
                   PKLookup      217.45      (3.9%)      217.78      (4.2%)    0.2% (  -7% -    8%) 0.906
                    LowTerm     1614.00      (3.7%)     1616.52      (4.3%)    0.2% (  -7% -    8%) 0.902
            MedSloppyPhrase      335.24      (2.8%)      336.50      (2.7%)    0.4% (  -4% -    6%) 0.665
                  MedPhrase      257.34      (2.7%)      258.59      (1.9%)    0.5% (  -4% -    5%) 0.515
                 HighPhrase      100.07      (2.1%)      100.66      (1.7%)    0.6% (  -3% -    4%) 0.332
   BrowseDayOfYearTaxoFacets       11.20      (2.8%)       11.28      (2.5%)    0.7% (  -4% -    6%) 0.410
      BrowseMonthTaxoFacets       13.07      (2.4%)       13.17      (1.9%)    0.7% (  -3% -    5%) 0.283
       BrowseDateTaxoFacets       11.18      (2.9%)       11.27      (2.5%)    0.8% (  -4% -    6%) 0.369
                   Wildcard       55.50      (4.6%)       56.08      (2.9%)    1.0% (  -6% -    8%) 0.391
                  LowPhrase      501.30      (3.5%)      506.61      (3.2%)    1.1% (  -5% -    8%) 0.319
                    Prefix3      107.90      (6.5%)      109.16      (3.9%)    1.2% (  -8% -   12%) 0.491
                    Respell       73.30      (3.3%)       74.17      (2.6%)    1.2% (  -4% -    7%) 0.210
               OrNotHighMed      625.05      (4.3%)      634.75      (4.9%)    1.6% (  -7% -   11%) 0.289
                     Fuzzy2       67.34     (18.7%)       68.92     (16.8%)    2.3% ( -27% -   46%) 0.677
                   HighTerm     1559.83      (4.6%)     1608.90      (5.3%)    3.1% (  -6% -   13%) 0.044
                     Fuzzy1       74.41     (17.1%)       77.02     (13.2%)    3.5% ( -22% -   40%) 0.467
                  OrHighMed      176.89      (4.0%)      192.17      (2.7%)    8.6% (   1% -   16%) 0.000
                  OrHighLow      179.14      (3.0%)      634.97     (16.3%)  254.5% ( 228% -  282%) 0.000
   ```
   ```
   full wikimedium5m run 3
                       TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     Fuzzy2       78.85     (17.1%)       74.79     (15.3%)   -5.1% ( -32% -   32%) 0.315
                     Fuzzy1       73.72     (12.3%)       70.14      (9.6%)   -4.9% ( -23% -   19%) 0.164
                  OrHighMed      218.87      (3.8%)      213.12      (3.9%)   -2.6% (  -9% -    5%) 0.031
              OrHighNotHigh      710.58      (5.0%)      693.73      (4.9%)   -2.4% ( -11% -    7%) 0.130
               OrHighNotLow      766.45      (7.0%)      752.36      (5.4%)   -1.8% ( -13% -   11%) 0.351
               OrHighNotMed      788.49      (4.6%)      779.76      (4.0%)   -1.1% (  -9% -    7%) 0.415
                MedSpanNear      432.51      (2.6%)      428.61      (2.9%)   -0.9% (  -6% -    4%) 0.301
                 HighPhrase      328.27      (2.6%)      325.47      (3.1%)   -0.9% (  -6% -    4%) 0.338
                    MedTerm     1537.24      (3.9%)     1525.49      (3.9%)   -0.8% (  -8% -    7%) 0.537
                   PKLookup      224.01      (3.4%)      222.35      (3.2%)   -0.7% (  -7% -    6%) 0.478
                   HighTerm     1852.48      (6.1%)     1839.68      (6.9%)   -0.7% ( -12% -   13%) 0.737
               OrNotHighLow      872.06      (4.3%)      866.35      (3.3%)   -0.7% (  -7% -    7%) 0.589
              OrNotHighHigh      696.91      (4.9%)      694.25      (5.3%)   -0.4% ( -10% -   10%) 0.814
                 AndHighMed      399.43      (3.7%)      398.38      (3.4%)   -0.3% (  -7% -    7%) 0.818
      BrowseMonthTaxoFacets       13.35      (2.5%)       13.33      (2.8%)   -0.1% (  -5% -    5%) 0.891
      BrowseMonthSSDVFacets       31.99      (2.2%)       31.97      (2.3%)   -0.1% (  -4% -    4%) 0.917
       HighIntervalsOrdered       56.92      (1.7%)       56.89      (1.5%)   -0.1% (  -3% -    3%) 0.916
                  MedPhrase      421.85      (2.6%)      421.64      (2.4%)   -0.1% (  -4% -    5%) 0.949
                LowSpanNear      215.84      (1.5%)      215.81      (1.9%)   -0.0% (  -3% -    3%) 0.975
   BrowseDayOfYearTaxoFacets       11.13      (3.0%)       11.13      (3.2%)   -0.0% (  -6% -    6%) 0.992
   BrowseDayOfYearSSDVFacets       27.51      (8.3%)       27.52      (8.1%)    0.0% ( -15% -   17%) 0.994
               HighSpanNear       16.99      (2.2%)       16.99      (2.1%)    0.0% (  -4% -    4%) 0.968
       BrowseDateTaxoFacets       11.11      (3.0%)       11.11      (3.3%)    0.0% (  -6% -    6%) 0.977
                   Wildcard      259.96      (2.3%)      260.11      (2.7%)    0.1% (  -4% -    5%) 0.943
       HighTermTitleBDVSort      216.56      (6.9%)      216.79      (7.9%)    0.1% ( -13% -   15%) 0.964
            LowSloppyPhrase       36.16      (3.5%)       36.20      (3.8%)    0.1% (  -6% -    7%) 0.922
                    LowTerm     1653.62      (6.1%)     1656.23      (4.8%)    0.2% ( -10% -   11%) 0.928
                 TermDTSort      236.21     (14.9%)      236.69     (14.7%)    0.2% ( -25% -   34%) 0.965
               OrNotHighMed      738.85      (3.6%)      741.27      (4.7%)    0.3% (  -7% -    9%) 0.806
                     IntNRQ      122.68      (1.1%)      123.17      (0.8%)    0.4% (  -1% -    2%) 0.210
                    Respell       75.86      (2.4%)       76.22      (2.0%)    0.5% (  -3% -    5%) 0.505
           HighSloppyPhrase       80.85      (3.7%)       81.25      (4.6%)    0.5% (  -7% -    9%) 0.708
            MedSloppyPhrase       31.20      (3.5%)       31.39      (4.3%)    0.6% (  -6% -    8%) 0.628
          HighTermMonthSort      396.29      (8.2%)      398.90      (9.3%)    0.7% ( -15% -   19%) 0.812
                    Prefix3      393.10      (2.7%)      396.20      (2.5%)    0.8% (  -4% -    6%) 0.339
                AndHighHigh      105.61      (3.7%)      106.69      (4.0%)    1.0% (  -6% -    9%) 0.399
                  LowPhrase       61.52      (2.1%)       62.17      (3.2%)    1.1% (  -4% -    6%) 0.221
                 AndHighLow      915.63      (4.3%)      928.98      (3.1%)    1.5% (  -5% -    9%) 0.217
      HighTermDayOfYearSort      216.71     (14.0%)      220.00     (15.9%)    1.5% ( -24% -   36%) 0.749
                  OrHighLow      535.18      (7.4%)      571.87      (5.8%)    6.9% (  -5% -   21%) 0.001
                 OrHighHigh       51.30      (2.8%)       56.55      (2.7%)   10.2% (   4% -   16%) 0.000
   ```
   
   So far the implementation seems to be similar to the baseline WANDScorer, with the surprising occasional huge speed up or `OrHighLow`. Hopefully this is not caused by a bug :D . I think this performance characteristics makes sense, as the low frequency / high score contribution term would drive the iteration, and a big window size would cause more docs to be pruned quickly if it can't be competitive from their maxScores.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

Posted by GitBox <gi...@apache.org>.

zacharymorn commented on a change in pull request #113:
URL: https://github.com/apache/lucene/pull/113#discussion_r629794667



##########
File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch;
+
+import java.io.IOException;
+import java.util.*;
+import org.apache.lucene.util.Bits;
+
+/** BulkScorer that leverages BMM algorithm within interval (min, max) */
+public class BMMBulkScorer extends BulkScorer {
+  private List<Scorer> scorers;
+  private DisiWrapper[] allScorers;
+  private Weight weight;
+  private ScoreMode scoreMode;
+  private int scalingFactor;
+  private long cost;
+  private static final int FIXED_WINDOW_SIZE = 2048;

Review comment:
       Hmm I thought we would like to use a window here so that we only need to update maxScore for scorers at larger interval checkpoint (the other implementation has more frequent checks and updates for maxScore, as it takes the min of block boundary of all scorers). But anyway by taking out the window here I assume you would like to have the BMM scorer run directly through BulkScorer? I can give that a try as well!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

Posted by GitBox <gi...@apache.org>.

zacharymorn commented on pull request #113:
URL: https://github.com/apache/lucene/pull/113#issuecomment-837753531


   I've also tried out smaller window sizes in the latest 2 commits (benchmark results in the git commit message), and it appears that window size of 1024 might have better performance than 2048 for OrMedMedMedMedMed queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #113: LUCENE-9335: [Discussion Only] Implement BMM with BulkScorer interface

Posted by GitBox <gi...@apache.org>.

jpountz commented on a change in pull request #113:
URL: https://github.com/apache/lucene/pull/113#discussion_r629112286



##########
File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch;
+
+import java.io.IOException;
+import java.util.*;
+import org.apache.lucene.util.Bits;
+
+/** BulkScorer that leverages BMM algorithm within interval (min, max) */
+public class BMMBulkScorer extends BulkScorer {
+  private List<Scorer> scorers;
+  private DisiWrapper[] allScorers;
+  private Weight weight;
+  private ScoreMode scoreMode;
+  private int scalingFactor;
+  private long cost;
+  private static final int FIXED_WINDOW_SIZE = 2048;

Review comment:
       The reason why BooleanScorer has such a window is to be able to collect hits into a bitset, which we're not doing here. Do the numbers get better if we get rid of this window?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org