You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Nikolay Khitrin (JIRA)" <ji...@apache.org> on 2018/07/27 11:13:00 UTC
[jira] [Updated] (LUCENE-8432) Stop calling comparator even if
early termination is not possible
[ https://issues.apache.org/jira/browse/LUCENE-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nikolay Khitrin updated LUCENE-8432:
------------------------------------
Description:
TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.
Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.
There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev Pct diff
HighTermMonthSort 226.04 (6.3%) 215.33 (4.3%) -4.7% ( -14% - 6%)
LowTerm 933.27 (5.5%) 924.62 (4.2%) -0.9% ( -10% - 9%)
OrNotHighLow 945.68 (5.7%) 939.12 (4.5%) -0.7% ( -10% - 10%)
MedSpanNear 28.76 (1.4%) 28.61 (1.5%) -0.5% ( -3% - 2%)
BrowseDayOfYearSSDVFacets 16.36 (5.0%) 16.29 (4.5%) -0.4% ( -9% - 9%)
AndHighMed 112.30 (2.9%) 111.96 (1.6%) -0.3% ( -4% - 4%)
LowSpanNear 12.42 (1.5%) 12.38 (1.6%) -0.3% ( -3% - 2%)
HighSloppyPhrase 18.66 (3.9%) 18.62 (4.0%) -0.2% ( -7% - 7%)
MedPhrase 219.40 (2.7%) 219.06 (2.7%) -0.2% ( -5% - 5%)
OrNotHighMed 222.88 (3.2%) 222.63 (3.4%) -0.1% ( -6% - 6%)
AndHighLow 521.59 (3.5%) 521.02 (4.5%) -0.1% ( -7% - 8%)
MedSloppyPhrase 16.71 (4.7%) 16.70 (4.7%) -0.0% ( -8% - 9%)
LowPhrase 15.58 (2.5%) 15.59 (2.9%) 0.0% ( -5% - 5%)
Respell 92.05 (2.4%) 92.19 (3.0%) 0.2% ( -5% - 5%)
HighSpanNear 17.03 (2.2%) 17.06 (2.1%) 0.2% ( -4% - 4%)
HighPhrase 37.85 (5.8%) 37.92 (5.9%) 0.2% ( -10% - 12%)
OrHighNotLow 118.25 (2.9%) 118.47 (3.5%) 0.2% ( -6% - 6%)
BrowseMonthTaxoFacets 2.94 (0.4%) 2.94 (0.8%) 0.2% ( 0% - 1%)
BrowseDateTaxoFacets 2.75 (0.3%) 2.75 (1.6%) 0.3% ( -1% - 2%)
LowSloppyPhrase 105.28 (2.3%) 105.60 (2.5%) 0.3% ( -4% - 5%)
Prefix3 122.07 (6.8%) 122.55 (6.5%) 0.4% ( -12% - 14%)
OrNotHighHigh 55.07 (3.8%) 55.29 (4.5%) 0.4% ( -7% - 8%)
BrowseMonthSSDVFacets 20.88 (7.2%) 20.99 (7.5%) 0.5% ( -13% - 16%)
OrHighNotHigh 58.40 (4.2%) 58.72 (4.8%) 0.6% ( -8% - 9%)
Wildcard 79.87 (3.7%) 80.31 (4.0%) 0.6% ( -6% - 8%)
OrHighMed 13.25 (4.3%) 13.34 (4.9%) 0.6% ( -8% - 10%)
BrowseDayOfYearTaxoFacets 2.73 (0.6%) 2.75 (1.6%) 0.7% ( -1% - 2%)
OrHighHigh 22.03 (4.1%) 22.19 (4.9%) 0.7% ( -8% - 10%)
AndHighHigh 23.46 (2.1%) 23.63 (1.9%) 0.7% ( -3% - 4%)
PKLookup 145.59 (4.2%) 146.66 (4.3%) 0.7% ( -7% - 9%)
MedTerm 171.13 (5.0%) 172.43 (5.1%) 0.8% ( -8% - 11%)
OrHighLow 119.22 (2.8%) 120.23 (3.1%) 0.8% ( -4% - 6%)
OrHighNotMed 87.06 (3.7%) 87.80 (4.1%) 0.8% ( -6% - 8%)
IntNRQ 26.44 (12.8%) 26.68 (11.5%) 0.9% ( -20% - 28%)
HighTerm 107.64 (6.1%) 108.88 (5.6%) 1.2% ( -9% - 13%)
Fuzzy2 69.69 (10.7%) 71.64 (7.4%) 2.8% ( -13% - 23%)
Fuzzy1 53.95 (6.5%) 55.79 (6.2%) 3.4% ( -8% - 17%)
HighTermDayOfYearSort 19.71 (4.7%) 21.51 (7.1%) 9.1% ( -2% - 21%){noformat}
Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.
was:
TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.
Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.
There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev Pct diff
HighTermMonthSort 226.04 (6.3%) 215.33 (4.3%) -4.7% ( -14% - 6%)
LowTerm 933.27 (5.5%) 924.62 (4.2%) -0.9% ( -10% - 9%)
OrNotHighLow 945.68 (5.7%) 939.12 (4.5%) -0.7% ( -10% - 10%)
MedSpanNear 28.76 (1.4%) 28.61 (1.5%) -0.5% ( -3% - 2%)
BrowseDayOfYearSSDVFacets 16.36 (5.0%) 16.29 (4.5%) -0.4% ( -9% - 9%)
AndHighMed 112.30 (2.9%) 111.96 (1.6%) -0.3% ( -4% - 4%)
LowSpanNear 12.42 (1.5%) 12.38 (1.6%) -0.3% ( -3% - 2%)
HighSloppyPhrase 18.66 (3.9%) 18.62 (4.0%) -0.2% ( -7% - 7%)
MedPhrase 219.40 (2.7%) 219.06 (2.7%) -0.2% ( -5% - 5%)
OrNotHighMed 222.88 (3.2%) 222.63 (3.4%) -0.1% ( -6% - 6%)
AndHighLow 521.59 (3.5%) 521.02 (4.5%) -0.1% ( -7% - 8%)
MedSloppyPhrase 16.71 (4.7%) 16.70 (4.7%) -0.0% ( -8% - 9%)
LowPhrase 15.58 (2.5%) 15.59 (2.9%) 0.0% ( -5% - 5%)
Respell 92.05 (2.4%) 92.19 (3.0%) 0.2% ( -5% - 5%)
HighSpanNear 17.03 (2.2%) 17.06 (2.1%) 0.2% ( -4% - 4%)
HighPhrase 37.85 (5.8%) 37.92 (5.9%) 0.2% ( -10% - 12%)
OrHighNotLow 118.25 (2.9%) 118.47 (3.5%) 0.2% ( -6% - 6%)
BrowseMonthTaxoFacets 2.94 (0.4%) 2.94 (0.8%) 0.2% ( 0% - 1%)
BrowseDateTaxoFacets 2.75 (0.3%) 2.75 (1.6%) 0.3% ( -1% - 2%)
LowSloppyPhrase 105.28 (2.3%) 105.60 (2.5%) 0.3% ( -4% - 5%)
Prefix3 122.07 (6.8%) 122.55 (6.5%) 0.4% ( -12% - 14%)
OrNotHighHigh 55.07 (3.8%) 55.29 (4.5%) 0.4% ( -7% - 8%)
BrowseMonthSSDVFacets 20.88 (7.2%) 20.99 (7.5%) 0.5% ( -13% - 16%)
OrHighNotHigh 58.40 (4.2%) 58.72 (4.8%) 0.6% ( -8% - 9%)
Wildcard 79.87 (3.7%) 80.31 (4.0%) 0.6% ( -6% - 8%)
OrHighMed 13.25 (4.3%) 13.34 (4.9%) 0.6% ( -8% - 10%)
BrowseDayOfYearTaxoFacets 2.73 (0.6%) 2.75 (1.6%) 0.7% ( -1% - 2%)
OrHighHigh 22.03 (4.1%) 22.19 (4.9%) 0.7% ( -8% - 10%)
AndHighHigh 23.46 (2.1%) 23.63 (1.9%) 0.7% ( -3% - 4%)
PKLookup 145.59 (4.2%) 146.66 (4.3%) 0.7% ( -7% - 9%)
MedTerm 171.13 (5.0%) 172.43 (5.1%) 0.8% ( -8% - 11%)
OrHighLow 119.22 (2.8%) 120.23 (3.1%) 0.8% ( -4% - 6%)
OrHighNotMed 87.06 (3.7%) 87.80 (4.1%) 0.8% ( -6% - 8%)
IntNRQ 26.44 (12.8%) 26.68 (11.5%) 0.9% ( -20% - 28%)
HighTerm 107.64 (6.1%) 108.88 (5.6%) 1.2% ( -9% - 13%)
Fuzzy2 69.69 (10.7%) 71.64 (7.4%) 2.8% ( -13% - 23%)
Fuzzy1 53.95 (6.5%) 55.79 (6.2%) 3.4% ( -8% - 17%)
HighTermDayOfYearSort 19.71 (4.7%) 21.51 (7.1%) 9.1% ( -2% - 21%){noformat}
Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.
> Stop calling comparator even if early termination is not possible
> -----------------------------------------------------------------
>
> Key: LUCENE-8432
> URL: https://issues.apache.org/jira/browse/LUCENE-8432
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 7.3
> Reporter: Nikolay Khitrin
> Priority: Minor
> Attachments: LUCENE-8432.patch
>
>
> TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.
> Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.
> There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
> {noformat}
> TaskQPS baseline StdDev QPS patch StdDev Pct diff
> HighTermMonthSort 226.04 (6.3%) 215.33 (4.3%) -4.7% ( -14% - 6%)
> LowTerm 933.27 (5.5%) 924.62 (4.2%) -0.9% ( -10% - 9%)
> OrNotHighLow 945.68 (5.7%) 939.12 (4.5%) -0.7% ( -10% - 10%)
> MedSpanNear 28.76 (1.4%) 28.61 (1.5%) -0.5% ( -3% - 2%)
> BrowseDayOfYearSSDVFacets 16.36 (5.0%) 16.29 (4.5%) -0.4% ( -9% - 9%)
> AndHighMed 112.30 (2.9%) 111.96 (1.6%) -0.3% ( -4% - 4%)
> LowSpanNear 12.42 (1.5%) 12.38 (1.6%) -0.3% ( -3% - 2%)
> HighSloppyPhrase 18.66 (3.9%) 18.62 (4.0%) -0.2% ( -7% - 7%)
> MedPhrase 219.40 (2.7%) 219.06 (2.7%) -0.2% ( -5% - 5%)
> OrNotHighMed 222.88 (3.2%) 222.63 (3.4%) -0.1% ( -6% - 6%)
> AndHighLow 521.59 (3.5%) 521.02 (4.5%) -0.1% ( -7% - 8%)
> MedSloppyPhrase 16.71 (4.7%) 16.70 (4.7%) -0.0% ( -8% - 9%)
> LowPhrase 15.58 (2.5%) 15.59 (2.9%) 0.0% ( -5% - 5%)
> Respell 92.05 (2.4%) 92.19 (3.0%) 0.2% ( -5% - 5%)
> HighSpanNear 17.03 (2.2%) 17.06 (2.1%) 0.2% ( -4% - 4%)
> HighPhrase 37.85 (5.8%) 37.92 (5.9%) 0.2% ( -10% - 12%)
> OrHighNotLow 118.25 (2.9%) 118.47 (3.5%) 0.2% ( -6% - 6%)
> BrowseMonthTaxoFacets 2.94 (0.4%) 2.94 (0.8%) 0.2% ( 0% - 1%)
> BrowseDateTaxoFacets 2.75 (0.3%) 2.75 (1.6%) 0.3% ( -1% - 2%)
> LowSloppyPhrase 105.28 (2.3%) 105.60 (2.5%) 0.3% ( -4% - 5%)
> Prefix3 122.07 (6.8%) 122.55 (6.5%) 0.4% ( -12% - 14%)
> OrNotHighHigh 55.07 (3.8%) 55.29 (4.5%) 0.4% ( -7% - 8%)
> BrowseMonthSSDVFacets 20.88 (7.2%) 20.99 (7.5%) 0.5% ( -13% - 16%)
> OrHighNotHigh 58.40 (4.2%) 58.72 (4.8%) 0.6% ( -8% - 9%)
> Wildcard 79.87 (3.7%) 80.31 (4.0%) 0.6% ( -6% - 8%)
> OrHighMed 13.25 (4.3%) 13.34 (4.9%) 0.6% ( -8% - 10%)
> BrowseDayOfYearTaxoFacets 2.73 (0.6%) 2.75 (1.6%) 0.7% ( -1% - 2%)
> OrHighHigh 22.03 (4.1%) 22.19 (4.9%) 0.7% ( -8% - 10%)
> AndHighHigh 23.46 (2.1%) 23.63 (1.9%) 0.7% ( -3% - 4%)
> PKLookup 145.59 (4.2%) 146.66 (4.3%) 0.7% ( -7% - 9%)
> MedTerm 171.13 (5.0%) 172.43 (5.1%) 0.8% ( -8% - 11%)
> OrHighLow 119.22 (2.8%) 120.23 (3.1%) 0.8% ( -4% - 6%)
> OrHighNotMed 87.06 (3.7%) 87.80 (4.1%) 0.8% ( -6% - 8%)
> IntNRQ 26.44 (12.8%) 26.68 (11.5%) 0.9% ( -20% - 28%)
> HighTerm 107.64 (6.1%) 108.88 (5.6%) 1.2% ( -9% - 13%)
> Fuzzy2 69.69 (10.7%) 71.64 (7.4%) 2.8% ( -13% - 23%)
> Fuzzy1 53.95 (6.5%) 55.79 (6.2%) 3.4% ( -8% - 17%)
> HighTermDayOfYearSort 19.71 (4.7%) 21.51 (7.1%) 9.1% ( -2% - 21%){noformat}
> Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org