You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "fudongyingluck (via GitHub)" <gi...@apache.org> on 2023/05/30 02:37:59 UTC

[GitHub] [lucene] fudongyingluck opened a new pull request, #12339: feat: soft delete optimize

fudongyingluck opened a new pull request, #12339:
URL: https://github.com/apache/lucene/pull/12339

   as es [issuse](https://github.com/elastic/elasticsearch/issues/75675) metioned when soft delete enable the numDeletesToMerge function is very time consume part. As the following picture show, there actually calculate duplicate.
   ![image](https://github.com/apache/lucene/assets/30896830/d92303d3-1ebc-4a5d-8ae0-143bfb3d4660)
   
   This change want to reuse the numDeletesToMerge result to reduce the time used
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize

Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1584035487

   lucene benchmark result, python3.10 src/python/localrun.py -source wikimediumall
   ```                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
          BrowseDayOfYearSSDVFacets        6.29      (7.5%)        6.16      (9.4%)   -2.1% ( -17% -   15%) 0.428
               HighTermTitleBDVSort       10.68      (8.0%)       10.51      (4.3%)   -1.6% ( -12% -   11%) 0.442
                             IntNRQ       47.94      (1.8%)       47.23      (8.1%)   -1.5% ( -11% -    8%) 0.422
             OrHighMedDayTaxoFacets        6.09      (6.9%)        6.02      (5.2%)   -1.1% ( -12% -   11%) 0.563
              BrowseMonthSSDVFacets        6.56      (8.0%)        6.50      (8.3%)   -0.9% ( -15% -   16%) 0.719
                MedIntervalsOrdered       27.91      (4.9%)       27.67      (5.1%)   -0.9% ( -10% -    9%) 0.579
                   HighSloppyPhrase        3.61      (4.3%)        3.58      (4.4%)   -0.7% (  -9% -    8%) 0.596
          BrowseDayOfYearTaxoFacets        7.06      (6.1%)        7.04      (5.2%)   -0.3% ( -10% -   11%) 0.850
                          OrHighLow      351.52      (2.9%)      350.34      (3.4%)   -0.3% (  -6% -    6%) 0.737
                         OrHighHigh       20.17      (3.4%)       20.11      (3.7%)   -0.3% (  -7% -    7%) 0.776
                    MedSloppyPhrase        4.99      (4.0%)        4.97      (4.7%)   -0.3% (  -8% -    8%) 0.822
        BrowseRandomLabelTaxoFacets        6.20      (7.1%)        6.19      (5.3%)   -0.2% ( -11% -   13%) 0.914
                          OrHighMed      106.18      (3.5%)      105.99      (3.4%)   -0.2% (  -6% -    6%) 0.866
               BrowseDateTaxoFacets        7.00      (5.9%)        6.99      (4.8%)   -0.1% ( -10% -   11%) 0.947
                             Fuzzy1       88.03      (3.3%)       87.96      (1.9%)   -0.1% (  -5% -    5%) 0.925
               HighIntervalsOrdered       12.93      (4.5%)       12.92      (4.4%)   -0.0% (  -8% -    9%) 0.979
                            Prefix3      224.61      (2.8%)      224.69      (2.3%)    0.0% (  -4% -    5%) 0.966
                    LowSloppyPhrase       22.07      (4.2%)       22.10      (4.1%)    0.1% (  -7% -    8%) 0.928
                       OrNotHighMed      403.85      (2.4%)      404.49      (2.9%)    0.2% (  -4% -    5%) 0.851
                       OrHighNotLow      468.49      (5.1%)      469.62      (4.9%)    0.2% (  -9% -   10%) 0.879
                  HighTermMonthSort     3512.17      (6.9%)     3523.19      (7.5%)    0.3% ( -13% -   15%) 0.890
                       OrHighNotMed      532.57      (4.3%)      534.39      (3.6%)    0.3% (  -7% -    8%) 0.786
                            MedTerm     1019.27      (4.5%)     1022.80      (4.3%)    0.3% (  -8% -    9%) 0.805
           AndHighHighDayTaxoFacets        7.35      (2.8%)        7.38      (1.7%)    0.4% (  -4% -    5%) 0.633
                        AndHighHigh       32.60      (3.8%)       32.72      (4.2%)    0.4% (  -7% -    8%) 0.776
                         AndHighLow      662.12      (3.5%)      664.62      (3.9%)    0.4% (  -6% -    7%) 0.745
                             Fuzzy2       91.31      (4.0%)       91.66      (2.3%)    0.4% (  -5% -    6%) 0.709
                      OrNotHighHigh      675.72      (3.3%)      679.20      (3.6%)    0.5% (  -6% -    7%) 0.636
                         AndHighMed       96.86      (5.9%)       97.41      (6.3%)    0.6% ( -11% -   13%) 0.771
                           PKLookup      281.91      (3.6%)      283.54      (2.6%)    0.6% (  -5% -    7%) 0.566
                           Wildcard      183.21      (4.9%)      184.35      (2.8%)    0.6% (  -6% -    8%) 0.619
                      OrHighNotHigh      500.76      (3.4%)      504.24      (3.4%)    0.7% (  -5% -    7%) 0.513
                          LowPhrase      183.76      (2.7%)      185.24      (2.5%)    0.8% (  -4% -    6%) 0.326
                            LowTerm      732.82      (3.0%)      738.99      (3.0%)    0.8% (  -4% -    6%) 0.368
               MedTermDayTaxoFacets       38.20      (2.9%)       38.53      (1.9%)    0.9% (  -3% -    5%) 0.273
                          MedPhrase       85.79      (2.5%)       86.54      (2.3%)    0.9% (  -3% -    5%) 0.250
                           HighTerm      678.62      (4.8%)      684.64      (4.4%)    0.9% (  -7% -   10%) 0.544
            AndHighMedDayTaxoFacets       34.42      (2.5%)       34.73      (1.5%)    0.9% (  -2% -    4%) 0.164
                LowIntervalsOrdered       16.93      (3.6%)       17.09      (3.2%)    1.0% (  -5% -    7%) 0.373
                        MedSpanNear       25.65      (3.4%)       25.89      (4.3%)    1.0% (  -6% -    9%) 0.440
                       HighSpanNear        9.16      (3.9%)        9.25      (4.7%)    1.0% (  -7% -    9%) 0.473
                         HighPhrase      136.81      (2.8%)      138.28      (2.8%)    1.1% (  -4% -    6%) 0.231
                            Respell       67.25      (4.6%)       68.00      (3.4%)    1.1% (  -6% -    9%) 0.377
        BrowseRandomLabelSSDVFacets        5.26      (7.4%)        5.32      (7.2%)    1.1% ( -12% -   16%) 0.627
                        LowSpanNear        7.90      (3.7%)        7.99      (3.9%)    1.1% (  -6% -    9%) 0.347
              HighTermDayOfYearSort      400.43      (2.6%)      405.41      (2.7%)    1.2% (  -3% -    6%) 0.137
                       OrNotHighLow      818.63      (3.1%)      828.86      (3.0%)    1.2% (  -4% -    7%) 0.199
                  HighTermTitleSort       62.96      (2.5%)       63.77      (3.1%)    1.3% (  -4% -    7%) 0.149
              BrowseMonthTaxoFacets       10.34     (33.0%)       10.47     (33.7%)    1.3% ( -49% -  101%) 0.902
                         TermDTSort      239.19      (4.0%)      242.85      (6.9%)    1.5% (  -8% -   12%) 0.390
               BrowseDateSSDVFacets        1.49     (12.4%)        1.54     (11.4%)    3.4% ( -18% -   31%) 0.362```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #12339: feat: soft delete optimize

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1584102566

   I'll note that there is still room for improvement, as this change doesn't cache the number of soft deletes across calls to `findMerges`. But the fix is so simple and contained, this looks to me like a good case of progress over perfection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] fudongyingluck closed pull request #12339: feat: soft delete optimize

Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck closed pull request #12339: feat: soft delete optimize
URL: https://github.com/apache/lucene/pull/12339


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize

Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1576330887

   This is the esrally result. The command is like`esrally race --track=http_logs --target-hosts=*:9201  --pipeline=benchmark-only --offline  --user-tag=softdelete:baseline --challenge=update`
   > |                                                        Metric |   Task |        Baseline |       Contender |        Diff |   Unit |   Diff % |
   |--------------------------------------------------------------:|-------:|----------------:|----------------:|------------:|-------:|---------:|
   |                    Cumulative indexing time of primary shards |        |   515.49        |   504.15        |   -11.3398  |    min |   -2.20% |
   |             Min cumulative indexing time across primary shard |        |     0           |     0           |     0       |    min |    0.00% |
   |          Median cumulative indexing time across primary shard |        |    17.7529      |    17.9699      |     0.2169  |    min |   +1.22% |
   |             Max cumulative indexing time across primary shard |        |   404.723       |   393.369       |   -11.3536  |    min |   -2.81% |
   |           Cumulative indexing throttle time of primary shards |        |     0           |     0           |     0       |    min |    0.00% |
   |    Min cumulative indexing throttle time across primary shard |        |     0           |     0           |     0       |    min |    0.00% |
   | Median cumulative indexing throttle time across primary shard |        |     0           |     0           |     0       |    min |    0.00% |
   |    Max cumulative indexing throttle time across primary shard |        |     0           |     0           |     0       |    min |    0.00% |
   |                       Cumulative merge time of primary shards |        |   133.81        |   127.489       |    -6.32017 |    min |   -4.72% |
   |                      Cumulative merge count of primary shards |        |   173           |   172           |    -1       |        |   -0.58% |
   |                Min cumulative merge time across primary shard |        |     0           |     0           |     0       |    min |    0.00% |
   |             Median cumulative merge time across primary shard |        |     2.61536     |     2.96084     |     0.34548 |    min |  +13.21% |
   |                Max cumulative merge time across primary shard |        |   118.648       |   110.923       |    -7.7245  |    min |   -6.51% |
   |              Cumulative merge throttle time of primary shards |        |    57.0305      |    55.1042      |    -1.92633 |    min |   -3.38% |
   |       Min cumulative merge throttle time across primary shard |        |     0           |     0           |     0       |    min |    0.00% |
   |    Median cumulative merge throttle time across primary shard |        |     0.215533    |     0.307242    |     0.09171 |    min |  +42.55% |
   |       Max cumulative merge throttle time across primary shard |        |    55.2842      |    53.1749      |    -2.10932 |    min |   -3.82% |
   |                     Cumulative refresh time of primary shards |        |    21.5803      |    20.5713      |    -1.009   |    min |   -4.68% |
   |                    Cumulative refresh count of primary shards |        |   668           |   674           |     6       |        |   +0.90% |
   |              Min cumulative refresh time across primary shard |        |     0           |     0           |     0       |    min |    0.00% |
   |           Median cumulative refresh time across primary shard |        |     0.542333    |     0.508642    |    -0.03369 |    min |   -6.21% |
   |              Max cumulative refresh time across primary shard |        |    18.1363      |    17.4352      |    -0.70113 |    min |   -3.87% |
   |                       Cumulative flush time of primary shards |        |     9.37332     |    10.4646      |     1.09132 |    min |  +11.64% |
   |                      Cumulative flush count of primary shards |        |    63           |    64           |     1       |        |   +1.59% |
   |                Min cumulative flush time across primary shard |        |     0.00296667  |     0.0001      |    -0.00287 |    min |  -96.63% |
   |             Median cumulative flush time across primary shard |        |     0.0971583   |     0.0769667   |    -0.02019 |    min |  -20.78% |
   |                Max cumulative flush time across primary shard |        |     8.6855      |     9.83638     |     1.15088 |    min |  +13.25% |
   |                                       Total Young Gen GC time |        |  1070.97        |  1065.08        |    -5.889   |      s |   -0.55% |
   |                                      Total Young Gen GC count |        |  8254           |  8187           |   -67       |        |   -0.81% |
   |                                         Total Old Gen GC time |        |     0.586       |     0           |    -0.586   |      s | -100.00% |
   |                                        Total Old Gen GC count |        |     3           |     0           |    -3       |        | -100.00% |
   |                                                    Store size |        |    17.0535      |    16.9082      |    -0.14531 |     GB |   -0.85% |
   |                                                 Translog size |        |     4.09782e-07 |     4.09782e-07 |     0       |     GB |    0.00% |
   |                                        Heap used for segments |        |     0           |     0           |     0       |     MB |    0.00% |
   |                                      Heap used for doc values |        |     0           |     0           |     0       |     MB |    0.00% |
   |                                           Heap used for terms |        |     0           |     0           |     0       |     MB |    0.00% |
   |                                           Heap used for norms |        |     0           |     0           |     0       |     MB |    0.00% |
   |                                          Heap used for points |        |     0           |     0           |     0       |     MB |    0.00% |
   |                                   Heap used for stored fields |        |     0           |     0           |     0       |     MB |    0.00% |
   |                                                 Segment count |        |   158           |   163           |     5       |        |   +3.16% |
   |                                   Total Ingest Pipeline count |        |     0           |     0           |     0       |        |    0.00% |
   |                                    Total Ingest Pipeline time |        |     0           |     0           |     0       |     ms |    0.00% |
   |                                  Total Ingest Pipeline failed |        |     0           |     0           |     0       |        |    0.00% |
   |                                                Min Throughput | update | 23056.7         | 23029.1         |   -27.5735  | docs/s |   -0.12% |
   |                                               Mean Throughput | update | 29585.3         | 29794           |   208.699   | docs/s |   +0.71% |
   |                                             Median Throughput | update | 28990.2         | 29011.7         |    21.4849  | docs/s |   +0.07% |
   |                                                Max Throughput | update | 36131.5         | 36197.3         |    65.8749  | docs/s |   +0.18% |
   |                                       50th percentile latency | update |  1421.89        |  1437.74        |    15.8507  |     ms |   +1.11% |
   |                                       90th percentile latency | update |  2410.13        |  2420.23        |    10.1008  |     ms |   +0.42% |
   |                                       99th percentile latency | update |  7076.3         |  7045.81        |   -30.4936  |     ms |   -0.43% |
   |                                     99.9th percentile latency | update | 11033.5         | 10406.9         |  -626.525   |     ms |   -5.68% |
   |                                    99.99th percentile latency | update | 14342.9         | 13304.1         | -1038.85    |     ms |   -7.24% |
   |                                      100th percentile latency | update | 21652.9         | 21399.9         |  -253       |     ms |   -1.17% |
   |                                  50th percentile service time | update |  1421.89        |  1437.74        |    15.8507  |     ms |   +1.11% |
   |                                  90th percentile service time | update |  2410.13        |  2420.23        |    10.1008  |     ms |   +0.42% |
   |                                  99th percentile service time | update |  7076.3         |  7045.81        |   -30.4936  |     ms |   -0.43% |
   |                                99.9th percentile service time | update | 11033.5         | 10406.9         |  -626.525   |     ms |   -5.68% |
   |                               99.99th percentile service time | update | 14342.9         | 13304.1         | -1038.85    |     ms |   -7.24% |
   |                                 100th percentile service time | update | 21652.9         | 21399.9         |  -253       |     ms |   -1.17% |
   |                                                    error rate | update |     0           |     0           |     0       |      % |    0.00% |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize

Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1577903918

   lucene benchmark result, `python3.10 src/python/localrun.py -source wikimediumall`
   ```TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
               BrowseDateSSDVFacets        1.54     (11.4%)        1.46     (16.1%)   -5.2% ( -29% -   25%) 0.242
             OrHighMedDayTaxoFacets        5.38      (5.6%)        5.24      (5.0%)   -2.6% ( -12% -    8%) 0.127
                           PKLookup      279.48      (3.0%)      273.06      (3.1%)   -2.3% (  -8% -    3%) 0.018
               MedTermDayTaxoFacets       35.78      (2.2%)       35.10      (1.8%)   -1.9% (  -5% -    2%) 0.002
               BrowseDateTaxoFacets        7.23     (22.3%)        7.10     (23.8%)   -1.8% ( -39% -   56%) 0.802
               HighIntervalsOrdered       10.59      (8.9%)       10.42      (8.6%)   -1.6% ( -17% -   17%) 0.568
          BrowseDayOfYearTaxoFacets        7.30     (21.8%)        7.19     (23.9%)   -1.6% ( -38% -   56%) 0.829
                LowIntervalsOrdered        4.55      (7.1%)        4.48      (7.1%)   -1.5% ( -14% -   13%) 0.495
                MedIntervalsOrdered        6.90      (8.1%)        6.81      (7.3%)   -1.4% ( -15% -   15%) 0.565
                             Fuzzy2      118.84      (2.2%)      117.28      (2.5%)   -1.3% (  -5% -    3%) 0.078
                            Respell       82.74      (3.1%)       81.79      (4.0%)   -1.2% (  -7% -    6%) 0.308
                  HighTermMonthSort     3093.29      (5.8%)     3057.85      (6.7%)   -1.1% ( -12% -   12%) 0.562
        BrowseRandomLabelTaxoFacets        6.40     (38.8%)        6.33     (40.9%)   -1.1% ( -58% -  128%) 0.930
                           HighTerm      791.45      (5.1%)      783.46      (4.7%)   -1.0% ( -10% -    9%) 0.517
                         HighPhrase       30.44      (2.3%)       30.16      (2.2%)   -0.9% (  -5% -    3%) 0.190
                             Fuzzy1      108.68      (2.7%)      107.67      (3.6%)   -0.9% (  -7% -    5%) 0.359
                       OrHighNotMed      320.94      (6.6%)      318.02      (5.3%)   -0.9% ( -11% -   11%) 0.629
                      OrNotHighHigh      468.36      (5.3%)      464.33      (4.2%)   -0.9% (  -9% -    9%) 0.568
                    LowSloppyPhrase       34.97      (4.1%)       34.69      (4.2%)   -0.8% (  -8% -    7%) 0.534
                          MedPhrase      242.27      (2.5%)      240.32      (1.9%)   -0.8% (  -5% -    3%) 0.248
                         AndHighMed       77.34      (6.0%)       76.76      (5.7%)   -0.8% ( -11% -   11%) 0.686
                       OrHighNotLow      744.00      (6.5%)      738.66      (5.8%)   -0.7% ( -12% -   12%) 0.711
                         AndHighLow      586.58      (3.5%)      582.51      (4.2%)   -0.7% (  -8% -    7%) 0.573
                   HighSloppyPhrase        3.91      (4.5%)        3.89      (3.9%)   -0.6% (  -8% -    8%) 0.670
                        MedSpanNear       37.46      (2.1%)       37.26      (2.5%)   -0.6% (  -5% -    4%) 0.441
                          LowPhrase      153.02      (2.2%)      152.17      (2.1%)   -0.6% (  -4% -    3%) 0.417
                       OrNotHighLow     1030.00      (3.2%)     1025.40      (3.5%)   -0.4% (  -6% -    6%) 0.675
                           Wildcard       35.75      (3.2%)       35.59      (4.5%)   -0.4% (  -7% -    7%) 0.723
                            MedTerm      761.12      (5.8%)      757.86      (6.0%)   -0.4% ( -11% -   12%) 0.819
                        AndHighHigh       22.42      (6.5%)       22.33      (5.7%)   -0.4% ( -11% -   12%) 0.830
                            LowTerm      689.41      (3.9%)      686.65      (4.6%)   -0.4% (  -8% -    8%) 0.768
                       HighSpanNear        2.47      (4.2%)        2.46      (5.0%)   -0.4% (  -9% -    9%) 0.789
           AndHighHighDayTaxoFacets        7.97      (1.6%)        7.94      (1.9%)   -0.4% (  -3% -    3%) 0.522
                      OrHighNotHigh      352.84      (6.6%)      351.68      (4.9%)   -0.3% ( -11% -   11%) 0.859
            AndHighMedDayTaxoFacets       48.80      (1.6%)       48.65      (2.3%)   -0.3% (  -4% -    3%) 0.611
                    MedSloppyPhrase       24.12      (2.4%)       24.04      (2.5%)   -0.3% (  -5% -    4%) 0.684
                          OrHighMed       37.82      (6.3%)       37.72      (5.5%)   -0.3% ( -11% -   12%) 0.891
               HighTermTitleBDVSort        7.13      (8.7%)        7.11      (8.1%)   -0.2% ( -15% -   18%) 0.927
                        LowSpanNear       26.13      (3.7%)       26.08      (3.3%)   -0.2% (  -6% -    7%) 0.866
                            Prefix3      408.84      (1.3%)      408.62      (2.1%)   -0.1% (  -3% -    3%) 0.923
                       OrNotHighMed      469.82      (4.2%)      470.09      (3.6%)    0.1% (  -7% -    8%) 0.963
                  HighTermTitleSort      105.40      (2.8%)      105.54      (4.6%)    0.1% (  -7% -    7%) 0.914
                         OrHighHigh       13.48      (5.0%)       13.51      (4.7%)    0.2% (  -9% -   10%) 0.905
                         TermDTSort      241.07      (3.6%)      242.46      (4.9%)    0.6% (  -7% -    9%) 0.671
                          OrHighLow      235.33      (5.1%)      237.04      (5.2%)    0.7% (  -9% -   11%) 0.655
        BrowseRandomLabelSSDVFacets        4.96      (3.8%)        5.00     (11.6%)    0.9% ( -13% -   16%) 0.746
              HighTermDayOfYearSort      290.03      (3.5%)      292.75      (3.7%)    0.9% (  -6% -    8%) 0.408
                             IntNRQ       52.81     (18.2%)       54.52     (15.7%)    3.2% ( -25% -   45%) 0.546
          BrowseDayOfYearSSDVFacets        6.11      (4.2%)        6.32     (10.6%)    3.4% ( -10% -   19%) 0.186
              BrowseMonthTaxoFacets        9.69     (33.2%)       10.17     (34.1%)    5.0% ( -46% -  108%) 0.641
              BrowseMonthSSDVFacets        6.35      (2.9%)        6.68     (10.3%)    5.2% (  -7% -   18%) 0.030```
   
   and the part of cpu profile result
   ```CPU merged search profile for my_modified_version:
   PERCENT       CPU SAMPLES   STACK
   4.71%         49024         org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
   3.88%         40306         org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD()
   3.50%         36379         java.nio.Buffer#scope()
   3.35%         34860         org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
   3.18%         33041         org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
   
   CPU merged search profile for baseline:
   PERCENT       CPU SAMPLES   STACK
   6.19%         63449         org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
   3.63%         37149         org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
   3.58%         36660         org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD()
   3.46%         35483         org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
   3.19%         32707         org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #12339: feat: soft delete optimize

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12339:
URL: https://github.com/apache/lucene/pull/12339#discussion_r1223923140


##########
lucene/core/src/java/org/apache/lucene/index/CachingMergeContext.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Set;
+import org.apache.lucene.util.InfoStream;
+
+/**
+ * a wrapper of IndexWriter MergeContext. Try to cache the {@link
+ * #numDeletesToMerge(SegmentCommitInfo)} result in merge phase, to avoid duplicate calculation
+ */
+public class CachingMergeContext implements MergePolicy.MergeContext {

Review Comment:
   Can you make it pkg-private instead of public?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize

Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1583939101

   Thanks @jpountz for your time. I really think this is a good idea, much better than I do. I wonder if the newest commit implement your idea. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #12339: feat: soft delete optimize

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz merged PR #12339:
URL: https://github.com/apache/lucene/pull/12339


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] fudongyingluck commented on a diff in pull request #12339: feat: soft delete optimize

Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on code in PR #12339:
URL: https://github.com/apache/lucene/pull/12339#discussion_r1223996217


##########
lucene/core/src/java/org/apache/lucene/index/CachingMergeContext.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Set;
+import org.apache.lucene.util.InfoStream;
+
+/**
+ * a wrapper of IndexWriter MergeContext. Try to cache the {@link
+ * #numDeletesToMerge(SegmentCommitInfo)} result in merge phase, to avoid duplicate calculation
+ */
+public class CachingMergeContext implements MergePolicy.MergeContext {

Review Comment:
   Yes, I've done this ~ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize

Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1583936303

   > No computing the number of deletes twice makes sense to me. What I'm not super happy about is that it's a bit trappy for merge policies, they need to be very careful to call the right methods to not compute it twice. E.g. I believe that `LogMergePolicy` needs a similar fix to the one that you made to `TieredMergePolicy`.
   > 
   > As a potential alternative, I wonder if `IndexWriter` could use a wrapper around the `MergeContext` which would memoize the number of deletes of every `SegmentCommitInfo` in a hash map when calling the merge policy. This way, if you happen to call `numDeletesToMerge` twice on the same `SegmentCommitInfo`, the second one would be served from the cache?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org