You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/03/30 21:45:39 UTC
[GitHub] [lucene] Yuti-G opened a new pull request #777: LUCENE-10488: Optimize Facets#getTopDims across Facets implementations
Yuti-G opened a new pull request #777:
URL: https://github.com/apache/lucene/pull/777
# Description
This change overrides getTopDims in ConcurrentSortedSetDocValuesFacetCounts to optimize the current method of getting dimCount, return FacetResult and resolve child paths for only the requested dims.
# Solution
* Override getTopDims and refactor the getPathResult function in SortedSetDocValuesFacetCounts to get dimCount (aggregated dim values) more efficiently by checking if dimCount has been populated in indexing time for a dim that is hierarchical or multiValued && requireDimCount, before aggregating dimCount by iterating its child ordinal.
* Use priority queue to store the requested top n dims and then call getPathResult to populate labels and return FacetResult for those dims.
# Tests
Added new testing for the overridden implementations of getTopDims in ConcurrentSortedSetDocValuesFacetCounts
# Checklist
Please review the following and check all that apply:
- [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
- [X] I have created a Jira issue and added the issue ID to my pull request title.
- [X] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
- [X] I have developed this patch against the `main` branch.
- [X] I have run `./gradlew check`.
- [X] I have added tests for my changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] Yuti-G edited a comment on pull request #777: LUCENE-10488: Optimize Facets#getTopDims across Facets implementations
Posted by GitBox <gi...@apache.org>.
Yuti-G edited a comment on pull request #777:
URL: https://github.com/apache/lucene/pull/777#issuecomment-1083703784
The changes in this PR is very similar to the changes in SortedSetDocValuesFacetCounts in [#747](https://github.com/apache/lucene/pull/747). Please see the benchmark results below, showing no regression. Thanks!
TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value
BrowseDateSSDVFacets 1.90 (6.9%) 1.85 (3.5%) -3.1% ( -12% - 7%) 0.076
HighPhrase 207.32 (3.0%) 204.24 (2.0%) -1.5% ( -6% - 3%) 0.067
LowSpanNear 42.32 (3.2%) 41.72 (3.0%) -1.4% ( -7% - 5%) 0.154
TermDTSort 94.21 (14.5%) 92.89 (15.1%) -1.4% ( -27% - 33%) 0.764
HighTermDayOfYearSort 97.96 (14.0%) 96.83 (19.3%) -1.2% ( -30% - 37%) 0.828
OrHighMed 89.21 (5.3%) 88.20 (4.4%) -1.1% ( -10% - 9%) 0.463
HighSpanNear 4.43 (4.3%) 4.39 (3.9%) -1.1% ( -8% - 7%) 0.408
OrHighHigh 21.72 (5.1%) 21.50 (4.8%) -1.0% ( -10% - 9%) 0.514
HighTermMonthSort 103.04 (16.1%) 102.08 (17.0%) -0.9% ( -29% - 38%) 0.859
LowIntervalsOrdered 8.73 (3.9%) 8.65 (4.1%) -0.9% ( -8% - 7%) 0.491
HighIntervalsOrdered 7.41 (5.0%) 7.35 (5.1%) -0.8% ( -10% - 9%) 0.601
OrHighLow 295.66 (3.6%) 293.18 (4.7%) -0.8% ( -8% - 7%) 0.522
MedSpanNear 34.06 (3.8%) 33.83 (3.7%) -0.7% ( -7% - 7%) 0.563
HighTermTitleBDVSort 83.74 (18.6%) 83.25 (20.3%) -0.6% ( -33% - 47%) 0.925
OrHighMedDayTaxoFacets 11.74 (3.3%) 11.68 (4.7%) -0.5% ( -8% - 7%) 0.688
MedIntervalsOrdered 45.79 (4.4%) 45.62 (3.9%) -0.4% ( -8% - 8%) 0.769
MedTerm 1288.46 (5.2%) 1283.91 (5.9%) -0.4% ( -10% - 11%) 0.841
Fuzzy1 72.64 (1.8%) 72.39 (2.6%) -0.4% ( -4% - 4%) 0.618
BrowseDateTaxoFacets 18.67 (13.7%) 18.63 (11.3%) -0.2% ( -22% - 28%) 0.963
Fuzzy2 27.57 (1.6%) 27.52 (2.0%) -0.2% ( -3% - 3%) 0.774
OrHighNotHigh 637.05 (5.1%) 636.46 (4.7%) -0.1% ( -9% - 10%) 0.952
Respell 57.70 (2.0%) 57.65 (3.0%) -0.1% ( -4% - 4%) 0.918
HighSloppyPhrase 13.48 (4.9%) 13.48 (3.8%) -0.0% ( -8% - 9%) 0.992
Wildcard 86.31 (4.4%) 86.32 (4.0%) 0.0% ( -7% - 8%) 0.993
AndHighMedDayTaxoFacets 81.66 (2.0%) 81.68 (2.2%) 0.0% ( -4% - 4%) 0.971
OrHighNotLow 793.57 (5.6%) 794.11 (4.9%) 0.1% ( -9% - 11%) 0.968
OrNotHighLow 1158.99 (4.0%) 1160.89 (4.1%) 0.2% ( -7% - 8%) 0.898
BrowseDayOfYearTaxoFacets 19.67 (14.7%) 19.71 (12.1%) 0.2% ( -23% - 31%) 0.963
IntNRQ 77.22 (0.9%) 77.39 (1.1%) 0.2% ( -1% - 2%) 0.494
OrNotHighMed 703.59 (3.7%) 705.40 (3.4%) 0.3% ( -6% - 7%) 0.820
LowTerm 1194.55 (5.2%) 1197.63 (5.7%) 0.3% ( -10% - 11%) 0.881
AndHighHigh 51.59 (6.8%) 51.74 (6.8%) 0.3% ( -12% - 15%) 0.889
OrHighNotMed 673.79 (5.7%) 676.44 (5.4%) 0.4% ( -10% - 12%) 0.824
BrowseDayOfYearSSDVFacets 9.94 (17.0%) 9.99 (19.3%) 0.4% ( -30% - 44%) 0.943
MedPhrase 195.63 (2.8%) 196.44 (1.8%) 0.4% ( -4% - 5%) 0.578
AndHighHighDayTaxoFacets 14.90 (2.2%) 14.97 (2.1%) 0.4% ( -3% - 4%) 0.510
MedSloppyPhrase 18.99 (3.8%) 19.08 (2.7%) 0.4% ( -5% - 7%) 0.666
LowSloppyPhrase 24.79 (4.5%) 24.90 (4.1%) 0.5% ( -7% - 9%) 0.731
AndHighMed 69.42 (4.5%) 69.90 (5.1%) 0.7% ( -8% - 10%) 0.650
HighTerm 944.79 (6.3%) 951.53 (6.9%) 0.7% ( -11% - 14%) 0.733
PKLookup 136.25 (3.4%) 137.24 (3.6%) 0.7% ( -6% - 8%) 0.519
BrowseRandomLabelSSDVFacets 6.04 (4.6%) 6.10 (10.0%) 0.9% ( -13% - 16%) 0.721
LowPhrase 87.91 (3.2%) 88.78 (2.3%) 1.0% ( -4% - 6%) 0.265
AndHighLow 641.61 (3.1%) 647.92 (2.6%) 1.0% ( -4% - 6%) 0.275
OrNotHighHigh 814.46 (4.5%) 825.51 (4.9%) 1.4% ( -7% - 11%) 0.361
BrowseRandomLabelTaxoFacets 13.37 (10.2%) 13.55 (5.6%) 1.4% ( -13% - 19%) 0.591
Prefix3 107.36 (12.5%) 109.53 (9.3%) 2.0% ( -17% - 27%) 0.561
MedTermDayTaxoFacets 18.30 (4.1%) 18.69 (5.8%) 2.1% ( -7% - 12%) 0.177
BrowseMonthTaxoFacets 19.47 (17.4%) 19.92 (15.2%) 2.3% ( -25% - 42%) 0.657
BrowseMonthSSDVFacets 10.83 (18.9%) 11.08 (22.5%) 2.3% ( -32% - 54%) 0.722
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on pull request #777: LUCENE-10488: Optimize Facets#getTopDims across Facets implementations
Posted by GitBox <gi...@apache.org>.
Yuti-G commented on pull request #777:
URL: https://github.com/apache/lucene/pull/777#issuecomment-1083703784
The changes in this PR is very similar to the changes in SortedSetDocValuesFacetCounts [#747](https://github.com/apache/lucene/pull/747). Please see the benchmark results below, showing no regression. Thanks!
TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value
BrowseDateSSDVFacets 1.90 (6.9%) 1.85 (3.5%) -3.1% ( -12% - 7%) 0.076
HighPhrase 207.32 (3.0%) 204.24 (2.0%) -1.5% ( -6% - 3%) 0.067
LowSpanNear 42.32 (3.2%) 41.72 (3.0%) -1.4% ( -7% - 5%) 0.154
TermDTSort 94.21 (14.5%) 92.89 (15.1%) -1.4% ( -27% - 33%) 0.764
HighTermDayOfYearSort 97.96 (14.0%) 96.83 (19.3%) -1.2% ( -30% - 37%) 0.828
OrHighMed 89.21 (5.3%) 88.20 (4.4%) -1.1% ( -10% - 9%) 0.463
HighSpanNear 4.43 (4.3%) 4.39 (3.9%) -1.1% ( -8% - 7%) 0.408
OrHighHigh 21.72 (5.1%) 21.50 (4.8%) -1.0% ( -10% - 9%) 0.514
HighTermMonthSort 103.04 (16.1%) 102.08 (17.0%) -0.9% ( -29% - 38%) 0.859
LowIntervalsOrdered 8.73 (3.9%) 8.65 (4.1%) -0.9% ( -8% - 7%) 0.491
HighIntervalsOrdered 7.41 (5.0%) 7.35 (5.1%) -0.8% ( -10% - 9%) 0.601
OrHighLow 295.66 (3.6%) 293.18 (4.7%) -0.8% ( -8% - 7%) 0.522
MedSpanNear 34.06 (3.8%) 33.83 (3.7%) -0.7% ( -7% - 7%) 0.563
HighTermTitleBDVSort 83.74 (18.6%) 83.25 (20.3%) -0.6% ( -33% - 47%) 0.925
OrHighMedDayTaxoFacets 11.74 (3.3%) 11.68 (4.7%) -0.5% ( -8% - 7%) 0.688
MedIntervalsOrdered 45.79 (4.4%) 45.62 (3.9%) -0.4% ( -8% - 8%) 0.769
MedTerm 1288.46 (5.2%) 1283.91 (5.9%) -0.4% ( -10% - 11%) 0.841
Fuzzy1 72.64 (1.8%) 72.39 (2.6%) -0.4% ( -4% - 4%) 0.618
BrowseDateTaxoFacets 18.67 (13.7%) 18.63 (11.3%) -0.2% ( -22% - 28%) 0.963
Fuzzy2 27.57 (1.6%) 27.52 (2.0%) -0.2% ( -3% - 3%) 0.774
OrHighNotHigh 637.05 (5.1%) 636.46 (4.7%) -0.1% ( -9% - 10%) 0.952
Respell 57.70 (2.0%) 57.65 (3.0%) -0.1% ( -4% - 4%) 0.918
HighSloppyPhrase 13.48 (4.9%) 13.48 (3.8%) -0.0% ( -8% - 9%) 0.992
Wildcard 86.31 (4.4%) 86.32 (4.0%) 0.0% ( -7% - 8%) 0.993
AndHighMedDayTaxoFacets 81.66 (2.0%) 81.68 (2.2%) 0.0% ( -4% - 4%) 0.971
OrHighNotLow 793.57 (5.6%) 794.11 (4.9%) 0.1% ( -9% - 11%) 0.968
OrNotHighLow 1158.99 (4.0%) 1160.89 (4.1%) 0.2% ( -7% - 8%) 0.898
BrowseDayOfYearTaxoFacets 19.67 (14.7%) 19.71 (12.1%) 0.2% ( -23% - 31%) 0.963
IntNRQ 77.22 (0.9%) 77.39 (1.1%) 0.2% ( -1% - 2%) 0.494
OrNotHighMed 703.59 (3.7%) 705.40 (3.4%) 0.3% ( -6% - 7%) 0.820
LowTerm 1194.55 (5.2%) 1197.63 (5.7%) 0.3% ( -10% - 11%) 0.881
AndHighHigh 51.59 (6.8%) 51.74 (6.8%) 0.3% ( -12% - 15%) 0.889
OrHighNotMed 673.79 (5.7%) 676.44 (5.4%) 0.4% ( -10% - 12%) 0.824
BrowseDayOfYearSSDVFacets 9.94 (17.0%) 9.99 (19.3%) 0.4% ( -30% - 44%) 0.943
MedPhrase 195.63 (2.8%) 196.44 (1.8%) 0.4% ( -4% - 5%) 0.578
AndHighHighDayTaxoFacets 14.90 (2.2%) 14.97 (2.1%) 0.4% ( -3% - 4%) 0.510
MedSloppyPhrase 18.99 (3.8%) 19.08 (2.7%) 0.4% ( -5% - 7%) 0.666
LowSloppyPhrase 24.79 (4.5%) 24.90 (4.1%) 0.5% ( -7% - 9%) 0.731
AndHighMed 69.42 (4.5%) 69.90 (5.1%) 0.7% ( -8% - 10%) 0.650
HighTerm 944.79 (6.3%) 951.53 (6.9%) 0.7% ( -11% - 14%) 0.733
PKLookup 136.25 (3.4%) 137.24 (3.6%) 0.7% ( -6% - 8%) 0.519
BrowseRandomLabelSSDVFacets 6.04 (4.6%) 6.10 (10.0%) 0.9% ( -13% - 16%) 0.721
LowPhrase 87.91 (3.2%) 88.78 (2.3%) 1.0% ( -4% - 6%) 0.265
AndHighLow 641.61 (3.1%) 647.92 (2.6%) 1.0% ( -4% - 6%) 0.275
OrNotHighHigh 814.46 (4.5%) 825.51 (4.9%) 1.4% ( -7% - 11%) 0.361
BrowseRandomLabelTaxoFacets 13.37 (10.2%) 13.55 (5.6%) 1.4% ( -13% - 19%) 0.591
Prefix3 107.36 (12.5%) 109.53 (9.3%) 2.0% ( -17% - 27%) 0.561
MedTermDayTaxoFacets 18.30 (4.1%) 18.69 (5.8%) 2.1% ( -7% - 12%) 0.177
BrowseMonthTaxoFacets 19.47 (17.4%) 19.92 (15.2%) 2.3% ( -25% - 42%) 0.657
BrowseMonthSSDVFacets 10.83 (18.9%) 11.08 (22.5%) 2.3% ( -32% - 54%) 0.722
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org