You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "fudongyingluck (via GitHub)" <gi...@apache.org> on 2023/05/30 02:37:59 UTC
[GitHub] [lucene] fudongyingluck opened a new pull request, #12339: feat: soft delete optimize
fudongyingluck opened a new pull request, #12339:
URL: https://github.com/apache/lucene/pull/12339
as es [issuse](https://github.com/elastic/elasticsearch/issues/75675) metioned when soft delete enable the numDeletesToMerge function is very time consume part. As the following picture show, there actually calculate duplicate.
![image](https://github.com/apache/lucene/assets/30896830/d92303d3-1ebc-4a5d-8ae0-143bfb3d4660)
This change want to reuse the numDeletesToMerge result to reduce the time used
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize
Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1584035487
lucene benchmark result, python3.10 src/python/localrun.py -source wikimediumall
``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseDayOfYearSSDVFacets 6.29 (7.5%) 6.16 (9.4%) -2.1% ( -17% - 15%) 0.428
HighTermTitleBDVSort 10.68 (8.0%) 10.51 (4.3%) -1.6% ( -12% - 11%) 0.442
IntNRQ 47.94 (1.8%) 47.23 (8.1%) -1.5% ( -11% - 8%) 0.422
OrHighMedDayTaxoFacets 6.09 (6.9%) 6.02 (5.2%) -1.1% ( -12% - 11%) 0.563
BrowseMonthSSDVFacets 6.56 (8.0%) 6.50 (8.3%) -0.9% ( -15% - 16%) 0.719
MedIntervalsOrdered 27.91 (4.9%) 27.67 (5.1%) -0.9% ( -10% - 9%) 0.579
HighSloppyPhrase 3.61 (4.3%) 3.58 (4.4%) -0.7% ( -9% - 8%) 0.596
BrowseDayOfYearTaxoFacets 7.06 (6.1%) 7.04 (5.2%) -0.3% ( -10% - 11%) 0.850
OrHighLow 351.52 (2.9%) 350.34 (3.4%) -0.3% ( -6% - 6%) 0.737
OrHighHigh 20.17 (3.4%) 20.11 (3.7%) -0.3% ( -7% - 7%) 0.776
MedSloppyPhrase 4.99 (4.0%) 4.97 (4.7%) -0.3% ( -8% - 8%) 0.822
BrowseRandomLabelTaxoFacets 6.20 (7.1%) 6.19 (5.3%) -0.2% ( -11% - 13%) 0.914
OrHighMed 106.18 (3.5%) 105.99 (3.4%) -0.2% ( -6% - 6%) 0.866
BrowseDateTaxoFacets 7.00 (5.9%) 6.99 (4.8%) -0.1% ( -10% - 11%) 0.947
Fuzzy1 88.03 (3.3%) 87.96 (1.9%) -0.1% ( -5% - 5%) 0.925
HighIntervalsOrdered 12.93 (4.5%) 12.92 (4.4%) -0.0% ( -8% - 9%) 0.979
Prefix3 224.61 (2.8%) 224.69 (2.3%) 0.0% ( -4% - 5%) 0.966
LowSloppyPhrase 22.07 (4.2%) 22.10 (4.1%) 0.1% ( -7% - 8%) 0.928
OrNotHighMed 403.85 (2.4%) 404.49 (2.9%) 0.2% ( -4% - 5%) 0.851
OrHighNotLow 468.49 (5.1%) 469.62 (4.9%) 0.2% ( -9% - 10%) 0.879
HighTermMonthSort 3512.17 (6.9%) 3523.19 (7.5%) 0.3% ( -13% - 15%) 0.890
OrHighNotMed 532.57 (4.3%) 534.39 (3.6%) 0.3% ( -7% - 8%) 0.786
MedTerm 1019.27 (4.5%) 1022.80 (4.3%) 0.3% ( -8% - 9%) 0.805
AndHighHighDayTaxoFacets 7.35 (2.8%) 7.38 (1.7%) 0.4% ( -4% - 5%) 0.633
AndHighHigh 32.60 (3.8%) 32.72 (4.2%) 0.4% ( -7% - 8%) 0.776
AndHighLow 662.12 (3.5%) 664.62 (3.9%) 0.4% ( -6% - 7%) 0.745
Fuzzy2 91.31 (4.0%) 91.66 (2.3%) 0.4% ( -5% - 6%) 0.709
OrNotHighHigh 675.72 (3.3%) 679.20 (3.6%) 0.5% ( -6% - 7%) 0.636
AndHighMed 96.86 (5.9%) 97.41 (6.3%) 0.6% ( -11% - 13%) 0.771
PKLookup 281.91 (3.6%) 283.54 (2.6%) 0.6% ( -5% - 7%) 0.566
Wildcard 183.21 (4.9%) 184.35 (2.8%) 0.6% ( -6% - 8%) 0.619
OrHighNotHigh 500.76 (3.4%) 504.24 (3.4%) 0.7% ( -5% - 7%) 0.513
LowPhrase 183.76 (2.7%) 185.24 (2.5%) 0.8% ( -4% - 6%) 0.326
LowTerm 732.82 (3.0%) 738.99 (3.0%) 0.8% ( -4% - 6%) 0.368
MedTermDayTaxoFacets 38.20 (2.9%) 38.53 (1.9%) 0.9% ( -3% - 5%) 0.273
MedPhrase 85.79 (2.5%) 86.54 (2.3%) 0.9% ( -3% - 5%) 0.250
HighTerm 678.62 (4.8%) 684.64 (4.4%) 0.9% ( -7% - 10%) 0.544
AndHighMedDayTaxoFacets 34.42 (2.5%) 34.73 (1.5%) 0.9% ( -2% - 4%) 0.164
LowIntervalsOrdered 16.93 (3.6%) 17.09 (3.2%) 1.0% ( -5% - 7%) 0.373
MedSpanNear 25.65 (3.4%) 25.89 (4.3%) 1.0% ( -6% - 9%) 0.440
HighSpanNear 9.16 (3.9%) 9.25 (4.7%) 1.0% ( -7% - 9%) 0.473
HighPhrase 136.81 (2.8%) 138.28 (2.8%) 1.1% ( -4% - 6%) 0.231
Respell 67.25 (4.6%) 68.00 (3.4%) 1.1% ( -6% - 9%) 0.377
BrowseRandomLabelSSDVFacets 5.26 (7.4%) 5.32 (7.2%) 1.1% ( -12% - 16%) 0.627
LowSpanNear 7.90 (3.7%) 7.99 (3.9%) 1.1% ( -6% - 9%) 0.347
HighTermDayOfYearSort 400.43 (2.6%) 405.41 (2.7%) 1.2% ( -3% - 6%) 0.137
OrNotHighLow 818.63 (3.1%) 828.86 (3.0%) 1.2% ( -4% - 7%) 0.199
HighTermTitleSort 62.96 (2.5%) 63.77 (3.1%) 1.3% ( -4% - 7%) 0.149
BrowseMonthTaxoFacets 10.34 (33.0%) 10.47 (33.7%) 1.3% ( -49% - 101%) 0.902
TermDTSort 239.19 (4.0%) 242.85 (6.9%) 1.5% ( -8% - 12%) 0.390
BrowseDateSSDVFacets 1.49 (12.4%) 1.54 (11.4%) 3.4% ( -18% - 31%) 0.362```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12339: feat: soft delete optimize
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1584102566
I'll note that there is still room for improvement, as this change doesn't cache the number of soft deletes across calls to `findMerges`. But the fix is so simple and contained, this looks to me like a good case of progress over perfection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] fudongyingluck closed pull request #12339: feat: soft delete optimize
Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck closed pull request #12339: feat: soft delete optimize
URL: https://github.com/apache/lucene/pull/12339
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize
Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1576330887
This is the esrally result. The command is like`esrally race --track=http_logs --target-hosts=*:9201 --pipeline=benchmark-only --offline --user-tag=softdelete:baseline --challenge=update`
> | Metric | Task | Baseline | Contender | Diff | Unit | Diff % |
|--------------------------------------------------------------:|-------:|----------------:|----------------:|------------:|-------:|---------:|
| Cumulative indexing time of primary shards | | 515.49 | 504.15 | -11.3398 | min | -2.20% |
| Min cumulative indexing time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative indexing time across primary shard | | 17.7529 | 17.9699 | 0.2169 | min | +1.22% |
| Max cumulative indexing time across primary shard | | 404.723 | 393.369 | -11.3536 | min | -2.81% |
| Cumulative indexing throttle time of primary shards | | 0 | 0 | 0 | min | 0.00% |
| Min cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Max cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Cumulative merge time of primary shards | | 133.81 | 127.489 | -6.32017 | min | -4.72% |
| Cumulative merge count of primary shards | | 173 | 172 | -1 | | -0.58% |
| Min cumulative merge time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative merge time across primary shard | | 2.61536 | 2.96084 | 0.34548 | min | +13.21% |
| Max cumulative merge time across primary shard | | 118.648 | 110.923 | -7.7245 | min | -6.51% |
| Cumulative merge throttle time of primary shards | | 57.0305 | 55.1042 | -1.92633 | min | -3.38% |
| Min cumulative merge throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative merge throttle time across primary shard | | 0.215533 | 0.307242 | 0.09171 | min | +42.55% |
| Max cumulative merge throttle time across primary shard | | 55.2842 | 53.1749 | -2.10932 | min | -3.82% |
| Cumulative refresh time of primary shards | | 21.5803 | 20.5713 | -1.009 | min | -4.68% |
| Cumulative refresh count of primary shards | | 668 | 674 | 6 | | +0.90% |
| Min cumulative refresh time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative refresh time across primary shard | | 0.542333 | 0.508642 | -0.03369 | min | -6.21% |
| Max cumulative refresh time across primary shard | | 18.1363 | 17.4352 | -0.70113 | min | -3.87% |
| Cumulative flush time of primary shards | | 9.37332 | 10.4646 | 1.09132 | min | +11.64% |
| Cumulative flush count of primary shards | | 63 | 64 | 1 | | +1.59% |
| Min cumulative flush time across primary shard | | 0.00296667 | 0.0001 | -0.00287 | min | -96.63% |
| Median cumulative flush time across primary shard | | 0.0971583 | 0.0769667 | -0.02019 | min | -20.78% |
| Max cumulative flush time across primary shard | | 8.6855 | 9.83638 | 1.15088 | min | +13.25% |
| Total Young Gen GC time | | 1070.97 | 1065.08 | -5.889 | s | -0.55% |
| Total Young Gen GC count | | 8254 | 8187 | -67 | | -0.81% |
| Total Old Gen GC time | | 0.586 | 0 | -0.586 | s | -100.00% |
| Total Old Gen GC count | | 3 | 0 | -3 | | -100.00% |
| Store size | | 17.0535 | 16.9082 | -0.14531 | GB | -0.85% |
| Translog size | | 4.09782e-07 | 4.09782e-07 | 0 | GB | 0.00% |
| Heap used for segments | | 0 | 0 | 0 | MB | 0.00% |
| Heap used for doc values | | 0 | 0 | 0 | MB | 0.00% |
| Heap used for terms | | 0 | 0 | 0 | MB | 0.00% |
| Heap used for norms | | 0 | 0 | 0 | MB | 0.00% |
| Heap used for points | | 0 | 0 | 0 | MB | 0.00% |
| Heap used for stored fields | | 0 | 0 | 0 | MB | 0.00% |
| Segment count | | 158 | 163 | 5 | | +3.16% |
| Total Ingest Pipeline count | | 0 | 0 | 0 | | 0.00% |
| Total Ingest Pipeline time | | 0 | 0 | 0 | ms | 0.00% |
| Total Ingest Pipeline failed | | 0 | 0 | 0 | | 0.00% |
| Min Throughput | update | 23056.7 | 23029.1 | -27.5735 | docs/s | -0.12% |
| Mean Throughput | update | 29585.3 | 29794 | 208.699 | docs/s | +0.71% |
| Median Throughput | update | 28990.2 | 29011.7 | 21.4849 | docs/s | +0.07% |
| Max Throughput | update | 36131.5 | 36197.3 | 65.8749 | docs/s | +0.18% |
| 50th percentile latency | update | 1421.89 | 1437.74 | 15.8507 | ms | +1.11% |
| 90th percentile latency | update | 2410.13 | 2420.23 | 10.1008 | ms | +0.42% |
| 99th percentile latency | update | 7076.3 | 7045.81 | -30.4936 | ms | -0.43% |
| 99.9th percentile latency | update | 11033.5 | 10406.9 | -626.525 | ms | -5.68% |
| 99.99th percentile latency | update | 14342.9 | 13304.1 | -1038.85 | ms | -7.24% |
| 100th percentile latency | update | 21652.9 | 21399.9 | -253 | ms | -1.17% |
| 50th percentile service time | update | 1421.89 | 1437.74 | 15.8507 | ms | +1.11% |
| 90th percentile service time | update | 2410.13 | 2420.23 | 10.1008 | ms | +0.42% |
| 99th percentile service time | update | 7076.3 | 7045.81 | -30.4936 | ms | -0.43% |
| 99.9th percentile service time | update | 11033.5 | 10406.9 | -626.525 | ms | -5.68% |
| 99.99th percentile service time | update | 14342.9 | 13304.1 | -1038.85 | ms | -7.24% |
| 100th percentile service time | update | 21652.9 | 21399.9 | -253 | ms | -1.17% |
| error rate | update | 0 | 0 | 0 | % | 0.00% |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize
Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1577903918
lucene benchmark result, `python3.10 src/python/localrun.py -source wikimediumall`
```TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseDateSSDVFacets 1.54 (11.4%) 1.46 (16.1%) -5.2% ( -29% - 25%) 0.242
OrHighMedDayTaxoFacets 5.38 (5.6%) 5.24 (5.0%) -2.6% ( -12% - 8%) 0.127
PKLookup 279.48 (3.0%) 273.06 (3.1%) -2.3% ( -8% - 3%) 0.018
MedTermDayTaxoFacets 35.78 (2.2%) 35.10 (1.8%) -1.9% ( -5% - 2%) 0.002
BrowseDateTaxoFacets 7.23 (22.3%) 7.10 (23.8%) -1.8% ( -39% - 56%) 0.802
HighIntervalsOrdered 10.59 (8.9%) 10.42 (8.6%) -1.6% ( -17% - 17%) 0.568
BrowseDayOfYearTaxoFacets 7.30 (21.8%) 7.19 (23.9%) -1.6% ( -38% - 56%) 0.829
LowIntervalsOrdered 4.55 (7.1%) 4.48 (7.1%) -1.5% ( -14% - 13%) 0.495
MedIntervalsOrdered 6.90 (8.1%) 6.81 (7.3%) -1.4% ( -15% - 15%) 0.565
Fuzzy2 118.84 (2.2%) 117.28 (2.5%) -1.3% ( -5% - 3%) 0.078
Respell 82.74 (3.1%) 81.79 (4.0%) -1.2% ( -7% - 6%) 0.308
HighTermMonthSort 3093.29 (5.8%) 3057.85 (6.7%) -1.1% ( -12% - 12%) 0.562
BrowseRandomLabelTaxoFacets 6.40 (38.8%) 6.33 (40.9%) -1.1% ( -58% - 128%) 0.930
HighTerm 791.45 (5.1%) 783.46 (4.7%) -1.0% ( -10% - 9%) 0.517
HighPhrase 30.44 (2.3%) 30.16 (2.2%) -0.9% ( -5% - 3%) 0.190
Fuzzy1 108.68 (2.7%) 107.67 (3.6%) -0.9% ( -7% - 5%) 0.359
OrHighNotMed 320.94 (6.6%) 318.02 (5.3%) -0.9% ( -11% - 11%) 0.629
OrNotHighHigh 468.36 (5.3%) 464.33 (4.2%) -0.9% ( -9% - 9%) 0.568
LowSloppyPhrase 34.97 (4.1%) 34.69 (4.2%) -0.8% ( -8% - 7%) 0.534
MedPhrase 242.27 (2.5%) 240.32 (1.9%) -0.8% ( -5% - 3%) 0.248
AndHighMed 77.34 (6.0%) 76.76 (5.7%) -0.8% ( -11% - 11%) 0.686
OrHighNotLow 744.00 (6.5%) 738.66 (5.8%) -0.7% ( -12% - 12%) 0.711
AndHighLow 586.58 (3.5%) 582.51 (4.2%) -0.7% ( -8% - 7%) 0.573
HighSloppyPhrase 3.91 (4.5%) 3.89 (3.9%) -0.6% ( -8% - 8%) 0.670
MedSpanNear 37.46 (2.1%) 37.26 (2.5%) -0.6% ( -5% - 4%) 0.441
LowPhrase 153.02 (2.2%) 152.17 (2.1%) -0.6% ( -4% - 3%) 0.417
OrNotHighLow 1030.00 (3.2%) 1025.40 (3.5%) -0.4% ( -6% - 6%) 0.675
Wildcard 35.75 (3.2%) 35.59 (4.5%) -0.4% ( -7% - 7%) 0.723
MedTerm 761.12 (5.8%) 757.86 (6.0%) -0.4% ( -11% - 12%) 0.819
AndHighHigh 22.42 (6.5%) 22.33 (5.7%) -0.4% ( -11% - 12%) 0.830
LowTerm 689.41 (3.9%) 686.65 (4.6%) -0.4% ( -8% - 8%) 0.768
HighSpanNear 2.47 (4.2%) 2.46 (5.0%) -0.4% ( -9% - 9%) 0.789
AndHighHighDayTaxoFacets 7.97 (1.6%) 7.94 (1.9%) -0.4% ( -3% - 3%) 0.522
OrHighNotHigh 352.84 (6.6%) 351.68 (4.9%) -0.3% ( -11% - 11%) 0.859
AndHighMedDayTaxoFacets 48.80 (1.6%) 48.65 (2.3%) -0.3% ( -4% - 3%) 0.611
MedSloppyPhrase 24.12 (2.4%) 24.04 (2.5%) -0.3% ( -5% - 4%) 0.684
OrHighMed 37.82 (6.3%) 37.72 (5.5%) -0.3% ( -11% - 12%) 0.891
HighTermTitleBDVSort 7.13 (8.7%) 7.11 (8.1%) -0.2% ( -15% - 18%) 0.927
LowSpanNear 26.13 (3.7%) 26.08 (3.3%) -0.2% ( -6% - 7%) 0.866
Prefix3 408.84 (1.3%) 408.62 (2.1%) -0.1% ( -3% - 3%) 0.923
OrNotHighMed 469.82 (4.2%) 470.09 (3.6%) 0.1% ( -7% - 8%) 0.963
HighTermTitleSort 105.40 (2.8%) 105.54 (4.6%) 0.1% ( -7% - 7%) 0.914
OrHighHigh 13.48 (5.0%) 13.51 (4.7%) 0.2% ( -9% - 10%) 0.905
TermDTSort 241.07 (3.6%) 242.46 (4.9%) 0.6% ( -7% - 9%) 0.671
OrHighLow 235.33 (5.1%) 237.04 (5.2%) 0.7% ( -9% - 11%) 0.655
BrowseRandomLabelSSDVFacets 4.96 (3.8%) 5.00 (11.6%) 0.9% ( -13% - 16%) 0.746
HighTermDayOfYearSort 290.03 (3.5%) 292.75 (3.7%) 0.9% ( -6% - 8%) 0.408
IntNRQ 52.81 (18.2%) 54.52 (15.7%) 3.2% ( -25% - 45%) 0.546
BrowseDayOfYearSSDVFacets 6.11 (4.2%) 6.32 (10.6%) 3.4% ( -10% - 19%) 0.186
BrowseMonthTaxoFacets 9.69 (33.2%) 10.17 (34.1%) 5.0% ( -46% - 108%) 0.641
BrowseMonthSSDVFacets 6.35 (2.9%) 6.68 (10.3%) 5.2% ( -7% - 18%) 0.030```
and the part of cpu profile result
```CPU merged search profile for my_modified_version:
PERCENT CPU SAMPLES STACK
4.71% 49024 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
3.88% 40306 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD()
3.50% 36379 java.nio.Buffer#scope()
3.35% 34860 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
3.18% 33041 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
CPU merged search profile for baseline:
PERCENT CPU SAMPLES STACK
6.19% 63449 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
3.63% 37149 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
3.58% 36660 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD()
3.46% 35483 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
3.19% 32707 org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12339: feat: soft delete optimize
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12339:
URL: https://github.com/apache/lucene/pull/12339#discussion_r1223923140
##########
lucene/core/src/java/org/apache/lucene/index/CachingMergeContext.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Set;
+import org.apache.lucene.util.InfoStream;
+
+/**
+ * a wrapper of IndexWriter MergeContext. Try to cache the {@link
+ * #numDeletesToMerge(SegmentCommitInfo)} result in merge phase, to avoid duplicate calculation
+ */
+public class CachingMergeContext implements MergePolicy.MergeContext {
Review Comment:
Can you make it pkg-private instead of public?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize
Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1583939101
Thanks @jpountz for your time. I really think this is a good idea, much better than I do. I wonder if the newest commit implement your idea.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #12339: feat: soft delete optimize
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz merged PR #12339:
URL: https://github.com/apache/lucene/pull/12339
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] fudongyingluck commented on a diff in pull request #12339: feat: soft delete optimize
Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on code in PR #12339:
URL: https://github.com/apache/lucene/pull/12339#discussion_r1223996217
##########
lucene/core/src/java/org/apache/lucene/index/CachingMergeContext.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Set;
+import org.apache.lucene.util.InfoStream;
+
+/**
+ * a wrapper of IndexWriter MergeContext. Try to cache the {@link
+ * #numDeletesToMerge(SegmentCommitInfo)} result in merge phase, to avoid duplicate calculation
+ */
+public class CachingMergeContext implements MergePolicy.MergeContext {
Review Comment:
Yes, I've done this ~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] fudongyingluck commented on pull request #12339: feat: soft delete optimize
Posted by "fudongyingluck (via GitHub)" <gi...@apache.org>.
fudongyingluck commented on PR #12339:
URL: https://github.com/apache/lucene/pull/12339#issuecomment-1583936303
> No computing the number of deletes twice makes sense to me. What I'm not super happy about is that it's a bit trappy for merge policies, they need to be very careful to call the right methods to not compute it twice. E.g. I believe that `LogMergePolicy` needs a similar fix to the one that you made to `TieredMergePolicy`.
>
> As a potential alternative, I wonder if `IndexWriter` could use a wrapper around the `MergeContext` which would memoize the number of deletes of every `SegmentCommitInfo` in a hash map when calling the merge policy. This way, if you happen to call `numDeletesToMerge` twice on the same `SegmentCommitInfo`, the second one would be served from the cache?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org