You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Bruno Roustant (Jira)" <ji...@apache.org> on 2020/02/21 16:41:00 UTC

[jira] [Comment Edited] (LUCENE-9237) Faster TermsEnum intersect for UniformSplit

    [ https://issues.apache.org/jira/browse/LUCENE-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041329#comment-17041329 ] 

Bruno Roustant edited comment on LUCENE-9237 at 2/21/20 4:40 PM:
-----------------------------------------------------------------

Luceneutil benchmark on wikimediumall, Lucene84 compared to UniformSplit:
(updated with the latest commit)

TaskQPS Lucene84 StdDevQPS UniformSplit2 StdDev Pct diff
 Respell 29.99 (1.6%) 16.04 (1.4%) -46.5% ( -48% - -44%)
 Fuzzy1 49.08 (4.3%) 30.60 (3.0%) -37.7% ( -43% - -31%)
 Fuzzy2 40.74 (3.8%) 25.95 (3.2%) -36.3% ( -41% - -30%)
 Wildcard 78.62 (3.3%) 74.99 (3.6%) -4.6% ( -11% - 2%)
 BrowseDateTaxoFacets 0.95 (1.7%) 0.95 (2.3%) -0.8% ( -4% - 3%)
BrowseDayOfYearTaxoFacets 0.95 (1.8%) 0.95 (2.2%) -0.6% ( -4% - 3%)
 HighIntervalsOrdered 7.59 (1.8%) 7.56 (2.0%) -0.4% ( -4% - 3%)
 BrowseMonthSSDVFacets 4.20 (1.2%) 4.18 (1.8%) -0.3% ( -3% - 2%)
 HighSpanNear 9.34 (2.9%) 9.32 (3.6%) -0.2% ( -6% - 6%)
BrowseDayOfYearSSDVFacets 3.74 (0.9%) 3.74 (1.5%) 0.0% ( -2% - 2%)
 HighTermDayOfYearSort 27.83 (9.4%) 27.86 (8.2%) 0.1% ( -16% - 19%)
 OrHighLow 328.40 (3.2%) 329.18 (4.8%) 0.2% ( -7% - 8%)
 AndHighMed 40.65 (3.9%) 40.75 (4.3%) 0.3% ( -7% - 8%)
 LowSpanNear 7.63 (1.7%) 7.66 (2.4%) 0.4% ( -3% - 4%)
 HighSloppyPhrase 7.70 (5.1%) 7.74 (4.1%) 0.5% ( -8% - 10%)
 OrHighMed 33.27 (2.3%) 33.45 (3.0%) 0.5% ( -4% - 5%)
 OrHighHigh 25.55 (2.2%) 25.68 (2.9%) 0.5% ( -4% - 5%)
 LowSloppyPhrase 7.35 (3.8%) 7.40 (3.3%) 0.6% ( -6% - 8%)
 MedSloppyPhrase 17.69 (5.3%) 17.80 (4.9%) 0.6% ( -9% - 11%)
 HighTermMonthSort 88.04 (13.2%) 88.78 (16.0%) 0.8% ( -25% - 34%)
 BrowseMonthTaxoFacets 1.02 (1.5%) 1.02 (2.1%) 0.8% ( -2% - 4%)
 OrHighNotLow 722.23 (5.9%) 728.93 (8.5%) 0.9% ( -12% - 16%)
 MedSpanNear 19.15 (2.3%) 19.34 (3.1%) 1.0% ( -4% - 6%)
 AndHighHigh 27.62 (3.2%) 28.01 (4.1%) 1.4% ( -5% - 9%)
 OrNotHighMed 616.49 (4.6%) 626.50 (7.7%) 1.6% ( -10% - 14%)
 OrHighNotHigh 711.49 (3.6%) 725.99 (8.4%) 2.0% ( -9% - 14%)
 PKLookup 170.84 (3.1%) 175.00 (3.2%) 2.4% ( -3% - 9%)
 IntNRQ 101.61 (2.5%) 104.18 (4.4%) 2.5% ( -4% - 9%)
 LowPhrase 63.47 (2.9%) 65.36 (3.1%) 3.0% ( -2% - 9%)
 MedPhrase 177.20 (2.5%) 182.92 (3.8%) 3.2% ( -3% - 9%)
 OrNotHighHigh 614.15 (2.8%) 638.78 (7.2%) 4.0% ( -5% - 14%)
 MedTerm 1515.73 (3.3%) 1586.40 (5.8%) 4.7% ( -4% - 14%)
 Prefix3 94.63 (3.8%) 99.19 (4.5%) 4.8% ( -4% - 14%)
 HighPhrase 173.40 (4.6%) 182.24 (4.9%) 5.1% ( -4% - 15%)
 OrNotHighLow 609.69 (4.2%) 642.59 (6.5%) 5.4% ( -5% - 16%)
 OrHighNotMed 637.68 (4.6%) 675.75 (8.9%) 6.0% ( -7% - 20%)
 LowTerm 1407.00 (3.1%) 1526.84 (5.3%) 8.5% ( 0% - 17%)
 AndHighLow 605.27 (2.1%) 657.24 (5.5%) 8.6% ( 1% - 16%)
 HighTerm 1093.38 (3.0%) 1196.25 (7.4%) 9.4% ( 0% - 20%)


was (Author: broustant):
Luceneutil benchmark on wikimediumall, Lucene84 compared to UniformSplit:

TaskQPS Lucene84 StdDevQPS UniformSplit2 StdDev Pct diff
 Respell 42.88 (3.4%) 23.13 (1.4%) -46.1% ( -49% - -42%)
 Fuzzy2 50.71 (5.2%) 28.74 (2.1%) -43.3% ( -48% - -37%)
 Fuzzy1 51.41 (4.9%) 31.95 (2.3%) -37.9% ( -42% - -32%)
 Wildcard 48.02 (4.8%) 44.21 (3.2%) -7.9% ( -15% - 0%)
 HighTermMonthSort 74.90 (11.5%) 72.16 (14.0%) -3.7% ( -26% - 24%)
 IntNRQ 25.68 (18.8%) 25.15 (20.0%) -2.1% ( -34% - 45%)
 OrHighNotMed 632.99 (4.5%) 628.19 (4.4%) -0.8% ( -9% - 8%)
 HighTermDayOfYearSort 35.52 (7.3%) 35.33 (8.4%) -0.5% ( -15% - 16%)
BrowseDayOfYearTaxoFacets 1.00 (2.0%) 0.99 (1.3%) -0.4% ( -3% - 2%)
 HighSpanNear 5.65 (2.5%) 5.63 (2.7%) -0.3% ( -5% - 5%)
 BrowseDateTaxoFacets 1.00 (2.1%) 1.00 (1.5%) -0.1% ( -3% - 3%)
 MedSloppyPhrase 29.04 (3.3%) 29.02 (3.0%) -0.0% ( -6% - 6%)
 OrHighNotLow 646.22 (5.7%) 646.28 (5.4%) 0.0% ( -10% - 11%)
 LowSpanNear 12.58 (2.5%) 12.58 (2.5%) 0.0% ( -4% - 5%)
BrowseDayOfYearSSDVFacets 3.92 (1.9%) 3.93 (1.5%) 0.1% ( -3% - 3%)
 HighIntervalsOrdered 4.89 (2.5%) 4.89 (2.1%) 0.1% ( -4% - 4%)
 BrowseMonthSSDVFacets 4.39 (2.2%) 4.39 (1.4%) 0.2% ( -3% - 3%)
 MedSpanNear 10.31 (2.9%) 10.33 (3.2%) 0.2% ( -5% - 6%)
 LowSloppyPhrase 5.62 (3.1%) 5.63 (3.0%) 0.2% ( -5% - 6%)
 HighSloppyPhrase 30.20 (4.5%) 30.38 (4.4%) 0.6% ( -7% - 9%)
 LowPhrase 44.43 (3.1%) 44.79 (1.8%) 0.8% ( -3% - 5%)
 OrHighHigh 19.02 (3.2%) 19.19 (2.2%) 0.9% ( -4% - 6%)
 OrHighMed 40.58 (2.8%) 40.98 (2.5%) 1.0% ( -4% - 6%)
 BrowseMonthTaxoFacets 1.06 (2.1%) 1.08 (1.5%) 1.2% ( -2% - 4%)
 OrNotHighHigh 568.16 (4.0%) 575.63 (6.2%) 1.3% ( -8% - 11%)
 AndHighMed 77.35 (3.1%) 78.44 (3.5%) 1.4% ( -4% - 8%)
 OrHighNotHigh 559.98 (5.7%) 568.13 (5.8%) 1.5% ( -9% - 13%)
 PKLookup 176.23 (4.4%) 180.43 (5.2%) 2.4% ( -6% - 12%)
 OrNotHighLow 407.60 (5.3%) 418.01 (3.3%) 2.6% ( -5% - 11%)
 HighTerm 1407.02 (6.8%) 1447.11 (5.5%) 2.8% ( -8% - 16%)
 AndHighHigh 135.64 (3.7%) 139.81 (3.7%) 3.1% ( -4% - 10%)
 AndHighLow 418.33 (3.0%) 432.67 (4.9%) 3.4% ( -4% - 11%)
 MedPhrase 368.04 (4.5%) 382.66 (5.1%) 4.0% ( -5% - 14%)
 OrNotHighMed 567.06 (5.0%) 589.79 (5.6%) 4.0% ( -6% - 15%)
 MedTerm 1106.66 (5.9%) 1155.97 (3.7%) 4.5% ( -4% - 14%)
 OrHighLow 453.06 (6.1%) 474.49 (5.2%) 4.7% ( -6% - 16%)
 Prefix3 49.10 (17.7%) 51.44 (17.6%) 4.8% ( -20% - 44%)
 HighPhrase 296.12 (4.8%) 310.31 (4.3%) 4.8% ( -4% - 14%)
 LowTerm 1465.12 (6.1%) 1544.04 (5.5%) 5.4% ( -5% - 18%)

> Faster TermsEnum intersect for UniformSplit
> -------------------------------------------
>
>                 Key: LUCENE-9237
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9237
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Assignee: Bruno Roustant
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> New version of TermsEnum intersect for UniformSplit. It is 75% more efficient than the previous version for FuzzyQuery.
> Compared to BlockTree IntersectTermsEnum:
>  - It is still slower for FuzzyQuery (-37%) but it is faster than the previous version (which was -65%).
>  - It is roughly same speed for WildcardQuery (-5%).
>  - It is slightly faster for PrefixQuery (+5%). Sometimes benchmarks show more improvement (I've seen up to +17% a fourth of the time).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org