You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jim Ferenczi (JIRA)" <ji...@apache.org> on 2018/07/27 13:02:00 UTC
[jira] [Commented] (LUCENE-8432) Stop calling comparator even if
early termination is not possible
[ https://issues.apache.org/jira/browse/LUCENE-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559721#comment-16559721 ]
Jim Ferenczi commented on LUCENE-8432:
--------------------------------------
Thanks [~khitrin], the change makes sense to me. The other way to achieve this optimization is to use a MultiCollector that wraps a TotalHitCountCollector and a TopFieldCollector but I prefer the solution that you propose. It's much simpler if this can be done automatically by the top field collector. Any objections [~jpountz] ?
> Stop calling comparator even if early termination is not possible
> -----------------------------------------------------------------
>
> Key: LUCENE-8432
> URL: https://issues.apache.org/jira/browse/LUCENE-8432
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 7.3
> Reporter: Nikolay Khitrin
> Priority: Minor
> Attachments: LUCENE-8432.patch
>
>
> TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.
> Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.
> There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
> {noformat}
> TaskQPS baseline StdDev QPS patch StdDev Pct diff
> HighTermMonthSort 226.04 (6.3%) 215.33 (4.3%) -4.7% ( -14% - 6%)
> LowTerm 933.27 (5.5%) 924.62 (4.2%) -0.9% ( -10% - 9%)
> OrNotHighLow 945.68 (5.7%) 939.12 (4.5%) -0.7% ( -10% - 10%)
> MedSpanNear 28.76 (1.4%) 28.61 (1.5%) -0.5% ( -3% - 2%)
> BrowseDayOfYearSSDVFacets 16.36 (5.0%) 16.29 (4.5%) -0.4% ( -9% - 9%)
> AndHighMed 112.30 (2.9%) 111.96 (1.6%) -0.3% ( -4% - 4%)
> LowSpanNear 12.42 (1.5%) 12.38 (1.6%) -0.3% ( -3% - 2%)
> HighSloppyPhrase 18.66 (3.9%) 18.62 (4.0%) -0.2% ( -7% - 7%)
> MedPhrase 219.40 (2.7%) 219.06 (2.7%) -0.2% ( -5% - 5%)
> OrNotHighMed 222.88 (3.2%) 222.63 (3.4%) -0.1% ( -6% - 6%)
> AndHighLow 521.59 (3.5%) 521.02 (4.5%) -0.1% ( -7% - 8%)
> MedSloppyPhrase 16.71 (4.7%) 16.70 (4.7%) -0.0% ( -8% - 9%)
> LowPhrase 15.58 (2.5%) 15.59 (2.9%) 0.0% ( -5% - 5%)
> Respell 92.05 (2.4%) 92.19 (3.0%) 0.2% ( -5% - 5%)
> HighSpanNear 17.03 (2.2%) 17.06 (2.1%) 0.2% ( -4% - 4%)
> HighPhrase 37.85 (5.8%) 37.92 (5.9%) 0.2% ( -10% - 12%)
> OrHighNotLow 118.25 (2.9%) 118.47 (3.5%) 0.2% ( -6% - 6%)
> BrowseMonthTaxoFacets 2.94 (0.4%) 2.94 (0.8%) 0.2% ( 0% - 1%)
> BrowseDateTaxoFacets 2.75 (0.3%) 2.75 (1.6%) 0.3% ( -1% - 2%)
> LowSloppyPhrase 105.28 (2.3%) 105.60 (2.5%) 0.3% ( -4% - 5%)
> Prefix3 122.07 (6.8%) 122.55 (6.5%) 0.4% ( -12% - 14%)
> OrNotHighHigh 55.07 (3.8%) 55.29 (4.5%) 0.4% ( -7% - 8%)
> BrowseMonthSSDVFacets 20.88 (7.2%) 20.99 (7.5%) 0.5% ( -13% - 16%)
> OrHighNotHigh 58.40 (4.2%) 58.72 (4.8%) 0.6% ( -8% - 9%)
> Wildcard 79.87 (3.7%) 80.31 (4.0%) 0.6% ( -6% - 8%)
> OrHighMed 13.25 (4.3%) 13.34 (4.9%) 0.6% ( -8% - 10%)
> BrowseDayOfYearTaxoFacets 2.73 (0.6%) 2.75 (1.6%) 0.7% ( -1% - 2%)
> OrHighHigh 22.03 (4.1%) 22.19 (4.9%) 0.7% ( -8% - 10%)
> AndHighHigh 23.46 (2.1%) 23.63 (1.9%) 0.7% ( -3% - 4%)
> PKLookup 145.59 (4.2%) 146.66 (4.3%) 0.7% ( -7% - 9%)
> MedTerm 171.13 (5.0%) 172.43 (5.1%) 0.8% ( -8% - 11%)
> OrHighLow 119.22 (2.8%) 120.23 (3.1%) 0.8% ( -4% - 6%)
> OrHighNotMed 87.06 (3.7%) 87.80 (4.1%) 0.8% ( -6% - 8%)
> IntNRQ 26.44 (12.8%) 26.68 (11.5%) 0.9% ( -20% - 28%)
> HighTerm 107.64 (6.1%) 108.88 (5.6%) 1.2% ( -9% - 13%)
> Fuzzy2 69.69 (10.7%) 71.64 (7.4%) 2.8% ( -13% - 23%)
> Fuzzy1 53.95 (6.5%) 55.79 (6.2%) 3.4% ( -8% - 17%)
> HighTermDayOfYearSort 19.71 (4.7%) 21.51 (7.1%) 9.1% ( -2% - 21%){noformat}
> Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org