You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Luca Cavanna (JIRA)" <ji...@apache.org> on 2019/05/09 15:22:00 UTC
[jira] [Comment Edited] (LUCENE-8796) Use exponential search in
IntArrayDocIdSet advance method
[ https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836467#comment-16836467 ]
Luca Cavanna edited comment on LUCENE-8796 at 5/9/19 3:21 PM:
--------------------------------------------------------------
I have updated the PR after applying Yonik's suggestion and re-run benchmarks a few times. The run with the least noise had these results (note that I disabled the bitset optimization on both sides):
{noformat}
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
HighTerm 1575.07 (5.9%) 1541.27 (6.9%) -2.1% ( -14% - 11%)
MedTerm 1363.22 (6.5%) 1337.03 (7.0%) -1.9% ( -14% - 12%)
LowTerm 1441.86 (4.2%) 1420.77 (5.2%) -1.5% ( -10% - 8%)
IntNRQConjMedTerm 280.55 (4.0%) 277.64 (4.1%) -1.0% ( -8% - 7%)
MedPhrase 153.84 (3.5%) 152.44 (3.3%) -0.9% ( -7% - 6%)
Prefix3 224.92 (4.0%) 223.13 (3.7%) -0.8% ( -8% - 7%)
HighSloppyPhrase 19.70 (3.7%) 19.56 (4.5%) -0.7% ( -8% - 7%)
MedSloppyPhrase 18.23 (4.3%) 18.11 (4.7%) -0.7% ( -9% - 8%)
OrNotHighMed 586.33 (3.4%) 582.47 (4.9%) -0.7% ( -8% - 7%)
LowSloppyPhrase 18.56 (3.6%) 18.46 (3.9%) -0.5% ( -7% - 7%)
HighPhrase 22.64 (2.7%) 22.54 (3.0%) -0.4% ( -6% - 5%)
LowPhrase 144.10 (3.8%) 143.55 (3.3%) -0.4% ( -7% - 6%)
AndHighLow 539.26 (3.7%) 537.25 (3.2%) -0.4% ( -7% - 6%)
PKLookup 132.96 (3.0%) 132.48 (4.6%) -0.4% ( -7% - 7%)
OrHighMed 115.79 (2.7%) 115.49 (3.5%) -0.3% ( -6% - 6%)
PrefixConjHighTerm 36.98 (2.8%) 36.93 (3.4%) -0.1% ( -6% - 6%)
WildcardConjHighTerm 45.79 (3.0%) 45.73 (3.1%) -0.1% ( -6% - 6%)
OrHighLow 448.91 (3.7%) 448.70 (6.3%) -0.0% ( -9% - 10%)
Wildcard 78.89 (3.2%) 78.95 (3.6%) 0.1% ( -6% - 7%)
IntNRQConjHighTerm 78.35 (2.3%) 78.48 (2.4%) 0.2% ( -4% - 4%)
IntNRQ 100.56 (2.7%) 100.84 (2.8%) 0.3% ( -5% - 5%)
OrHighNotLow 732.45 (2.8%) 734.56 (5.3%) 0.3% ( -7% - 8%)
OrHighNotHigh 544.87 (2.8%) 546.47 (4.6%) 0.3% ( -6% - 7%)
IntNRQConjLowTerm 249.20 (4.2%) 249.99 (3.8%) 0.3% ( -7% - 8%)
Respell 73.05 (3.1%) 73.28 (3.4%) 0.3% ( -6% - 7%)
OrHighHigh 35.56 (3.0%) 35.68 (4.2%) 0.3% ( -6% - 7%)
OrNotHighLow 695.41 (4.8%) 697.88 (6.5%) 0.4% ( -10% - 12%)
MedSpanNear 59.99 (3.8%) 60.30 (4.0%) 0.5% ( -7% - 8%)
AndHighMed 190.02 (3.1%) 191.04 (3.6%) 0.5% ( -5% - 7%)
LowSpanNear 12.73 (3.9%) 12.81 (4.2%) 0.6% ( -7% - 8%)
HighTermDayOfYearSort 88.42 (7.0%) 89.09 (7.1%) 0.8% ( -12% - 15%)
PrefixConjLowTerm 54.95 (3.7%) 55.43 (3.8%) 0.9% ( -6% - 8%)
OrHighNotMed 628.44 (3.4%) 634.02 (6.1%) 0.9% ( -8% - 10%)
HighSpanNear 28.86 (3.2%) 29.11 (3.5%) 0.9% ( -5% - 7%)
WildcardConjMedTerm 72.48 (3.4%) 73.19 (4.8%) 1.0% ( -7% - 9%)
Fuzzy2 49.17 (9.9%) 49.68 (11.7%) 1.0% ( -18% - 25%)
AndHighHigh 63.44 (3.8%) 64.11 (3.8%) 1.1% ( -6% - 9%)
Fuzzy1 79.43 (9.9%) 80.55 (9.7%) 1.4% ( -16% - 23%)
OrNotHighHigh 574.89 (3.6%) 584.43 (5.5%) 1.7% ( -7% - 11%)
PrefixConjMedTerm 79.00 (3.2%) 80.50 (3.6%) 1.9% ( -4% - 8%)
WildcardConjLowTerm 90.67 (2.9%) 92.49 (3.7%) 2.0% ( -4% - 8%)
HighTermMonthSort 86.13 (11.8%) 88.79 (12.4%) 3.1% ( -18% - 30%)
{noformat}
I also ran benchmarks with the bitset optimization in place on both ends:
{{{noformat}}}
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
IntNRQ 63.46 (24.6%) 62.28 (24.2%) -1.9% ( -40% - 62%)
OrNotHighMed 596.89 (3.5%) 589.18 (4.6%) -1.3% ( -9% - 7%)
OrNotHighHigh 769.65 (3.1%) 760.89 (2.9%) -1.1% ( -6% - 4%)
Fuzzy1 76.45 (6.8%) 75.62 (7.4%) -1.1% ( -14% - 14%)
OrHighNotHigh 626.48 (3.4%) 619.67 (2.6%) -1.1% ( -6% - 5%)
MedTerm 1345.24 (3.6%) 1332.07 (3.7%) -1.0% ( -7% - 6%)
PKLookup 136.70 (4.0%) 135.57 (3.9%) -0.8% ( -8% - 7%)
HighTerm 1103.39 (3.5%) 1095.23 (3.7%) -0.7% ( -7% - 6%)
OrNotHighLow 780.85 (3.8%) 775.40 (2.7%) -0.7% ( -6% - 5%)
OrHighNotLow 815.32 (4.7%) 810.72 (5.7%) -0.6% ( -10% - 10%)
Prefix3 310.00 (4.9%) 308.31 (3.7%) -0.5% ( -8% - 8%)
LowTerm 1462.30 (3.7%) 1455.27 (4.4%) -0.5% ( -8% - 7%)
OrHighLow 446.56 (4.6%) 445.11 (3.2%) -0.3% ( -7% - 7%)
AndHighLow 594.90 (2.9%) 593.39 (3.4%) -0.3% ( -6% - 6%)
Respell 64.46 (2.5%) 64.36 (2.6%) -0.2% ( -5% - 5%)
OrHighNotMed 685.98 (4.5%) 685.69 (3.7%) -0.0% ( -7% - 8%)
OrHighMed 67.90 (5.1%) 67.91 (3.4%) 0.0% ( -8% - 8%)
Fuzzy2 50.18 (4.5%) 50.21 (5.8%) 0.1% ( -9% - 10%)
LowSpanNear 59.27 (3.9%) 59.34 (4.0%) 0.1% ( -7% - 8%)
OrHighHigh 30.89 (5.2%) 30.94 (3.3%) 0.2% ( -7% - 9%)
LowPhrase 114.67 (3.1%) 114.87 (2.5%) 0.2% ( -5% - 5%)
HighPhrase 22.34 (2.7%) 22.42 (2.2%) 0.4% ( -4% - 5%)
AndHighHigh 59.53 (3.8%) 59.89 (4.4%) 0.6% ( -7% - 9%)
MedPhrase 29.99 (2.9%) 30.19 (2.3%) 0.7% ( -4% - 6%)
MedSloppyPhrase 71.57 (3.1%) 72.10 (3.0%) 0.7% ( -5% - 7%)
IntNRQConjHighTerm 113.74 (7.3%) 114.66 (7.1%) 0.8% ( -12% - 16%)
LowSloppyPhrase 14.18 (3.4%) 14.30 (2.6%) 0.8% ( -4% - 6%)
PrefixConjLowTerm 89.05 (4.6%) 89.80 (5.1%) 0.8% ( -8% - 11%)
AndHighMed 166.34 (3.1%) 167.76 (3.8%) 0.9% ( -5% - 7%)
WildcardConjMedTerm 51.44 (2.6%) 51.88 (3.0%) 0.9% ( -4% - 6%)
PrefixConjMedTerm 68.16 (4.8%) 68.80 (4.6%) 0.9% ( -8% - 10%)
PrefixConjHighTerm 42.34 (6.1%) 42.81 (5.0%) 1.1% ( -9% - 13%)
MedSpanNear 15.57 (5.5%) 15.74 (5.4%) 1.1% ( -9% - 12%)
WildcardConjLowTerm 51.56 (3.7%) 52.15 (4.2%) 1.1% ( -6% - 9%)
HighSpanNear 5.66 (5.8%) 5.73 (5.9%) 1.2% ( -9% - 13%)
IntNRQConjLowTerm 120.28 (8.5%) 121.67 (8.8%) 1.2% ( -14% - 20%)
WildcardConjHighTerm 55.43 (3.2%) 56.10 (3.4%) 1.2% ( -5% - 8%)
IntNRQConjMedTerm 97.79 (8.3%) 98.98 (8.6%) 1.2% ( -14% - 19%)
Wildcard 106.37 (2.9%) 107.75 (3.6%) 1.3% ( -5% - 7%)
HighSloppyPhrase 18.21 (4.9%) 18.48 (4.4%) 1.5% ( -7% - 11%)
HighTermMonthSort 146.10 (11.0%) 148.89 (10.5%) 1.9% ( -17% - 26%)
HighTermDayOfYearSort 68.62 (6.1%) 70.08 (3.9%) 2.1% ( -7% - 12%)
{{{noformat}}}
I will next have a look at what Atri is suggesting.
was (Author: lucacavanna):
I have updated the PR after applying Yonik's suggestion and re-run benchmarks a few times. The run with the least noise had these results (note that I disabled the bitset optimization on both sides):
{{
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
HighTerm 1575.07 (5.9%) 1541.27 (6.9%) -2.1% ( -14% - 11%)
MedTerm 1363.22 (6.5%) 1337.03 (7.0%) -1.9% ( -14% - 12%)
LowTerm 1441.86 (4.2%) 1420.77 (5.2%) -1.5% ( -10% - 8%)
IntNRQConjMedTerm 280.55 (4.0%) 277.64 (4.1%) -1.0% ( -8% - 7%)
MedPhrase 153.84 (3.5%) 152.44 (3.3%) -0.9% ( -7% - 6%)
Prefix3 224.92 (4.0%) 223.13 (3.7%) -0.8% ( -8% - 7%)
HighSloppyPhrase 19.70 (3.7%) 19.56 (4.5%) -0.7% ( -8% - 7%)
MedSloppyPhrase 18.23 (4.3%) 18.11 (4.7%) -0.7% ( -9% - 8%)
OrNotHighMed 586.33 (3.4%) 582.47 (4.9%) -0.7% ( -8% - 7%)
LowSloppyPhrase 18.56 (3.6%) 18.46 (3.9%) -0.5% ( -7% - 7%)
HighPhrase 22.64 (2.7%) 22.54 (3.0%) -0.4% ( -6% - 5%)
LowPhrase 144.10 (3.8%) 143.55 (3.3%) -0.4% ( -7% - 6%)
AndHighLow 539.26 (3.7%) 537.25 (3.2%) -0.4% ( -7% - 6%)
PKLookup 132.96 (3.0%) 132.48 (4.6%) -0.4% ( -7% - 7%)
OrHighMed 115.79 (2.7%) 115.49 (3.5%) -0.3% ( -6% - 6%)
PrefixConjHighTerm 36.98 (2.8%) 36.93 (3.4%) -0.1% ( -6% - 6%)
WildcardConjHighTerm 45.79 (3.0%) 45.73 (3.1%) -0.1% ( -6% - 6%)
OrHighLow 448.91 (3.7%) 448.70 (6.3%) -0.0% ( -9% - 10%)
Wildcard 78.89 (3.2%) 78.95 (3.6%) 0.1% ( -6% - 7%)
IntNRQConjHighTerm 78.35 (2.3%) 78.48 (2.4%) 0.2% ( -4% - 4%)
IntNRQ 100.56 (2.7%) 100.84 (2.8%) 0.3% ( -5% - 5%)
OrHighNotLow 732.45 (2.8%) 734.56 (5.3%) 0.3% ( -7% - 8%)
OrHighNotHigh 544.87 (2.8%) 546.47 (4.6%) 0.3% ( -6% - 7%)
IntNRQConjLowTerm 249.20 (4.2%) 249.99 (3.8%) 0.3% ( -7% - 8%)
Respell 73.05 (3.1%) 73.28 (3.4%) 0.3% ( -6% - 7%)
OrHighHigh 35.56 (3.0%) 35.68 (4.2%) 0.3% ( -6% - 7%)
OrNotHighLow 695.41 (4.8%) 697.88 (6.5%) 0.4% ( -10% - 12%)
MedSpanNear 59.99 (3.8%) 60.30 (4.0%) 0.5% ( -7% - 8%)
AndHighMed 190.02 (3.1%) 191.04 (3.6%) 0.5% ( -5% - 7%)
LowSpanNear 12.73 (3.9%) 12.81 (4.2%) 0.6% ( -7% - 8%)
HighTermDayOfYearSort 88.42 (7.0%) 89.09 (7.1%) 0.8% ( -12% - 15%)
PrefixConjLowTerm 54.95 (3.7%) 55.43 (3.8%) 0.9% ( -6% - 8%)
OrHighNotMed 628.44 (3.4%) 634.02 (6.1%) 0.9% ( -8% - 10%)
HighSpanNear 28.86 (3.2%) 29.11 (3.5%) 0.9% ( -5% - 7%)
WildcardConjMedTerm 72.48 (3.4%) 73.19 (4.8%) 1.0% ( -7% - 9%)
Fuzzy2 49.17 (9.9%) 49.68 (11.7%) 1.0% ( -18% - 25%)
AndHighHigh 63.44 (3.8%) 64.11 (3.8%) 1.1% ( -6% - 9%)
Fuzzy1 79.43 (9.9%) 80.55 (9.7%) 1.4% ( -16% - 23%)
OrNotHighHigh 574.89 (3.6%) 584.43 (5.5%) 1.7% ( -7% - 11%)
PrefixConjMedTerm 79.00 (3.2%) 80.50 (3.6%) 1.9% ( -4% - 8%)
WildcardConjLowTerm 90.67 (2.9%) 92.49 (3.7%) 2.0% ( -4% - 8%)
HighTermMonthSort 86.13 (11.8%) 88.79 (12.4%) 3.1% ( -18% - 30%)
}}
I also ran benchmarks with the bitset optimization in place on both ends:
{{
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
IntNRQ 63.46 (24.6%) 62.28 (24.2%) -1.9% ( -40% - 62%)
OrNotHighMed 596.89 (3.5%) 589.18 (4.6%) -1.3% ( -9% - 7%)
OrNotHighHigh 769.65 (3.1%) 760.89 (2.9%) -1.1% ( -6% - 4%)
Fuzzy1 76.45 (6.8%) 75.62 (7.4%) -1.1% ( -14% - 14%)
OrHighNotHigh 626.48 (3.4%) 619.67 (2.6%) -1.1% ( -6% - 5%)
MedTerm 1345.24 (3.6%) 1332.07 (3.7%) -1.0% ( -7% - 6%)
PKLookup 136.70 (4.0%) 135.57 (3.9%) -0.8% ( -8% - 7%)
HighTerm 1103.39 (3.5%) 1095.23 (3.7%) -0.7% ( -7% - 6%)
OrNotHighLow 780.85 (3.8%) 775.40 (2.7%) -0.7% ( -6% - 5%)
OrHighNotLow 815.32 (4.7%) 810.72 (5.7%) -0.6% ( -10% - 10%)
Prefix3 310.00 (4.9%) 308.31 (3.7%) -0.5% ( -8% - 8%)
LowTerm 1462.30 (3.7%) 1455.27 (4.4%) -0.5% ( -8% - 7%)
OrHighLow 446.56 (4.6%) 445.11 (3.2%) -0.3% ( -7% - 7%)
AndHighLow 594.90 (2.9%) 593.39 (3.4%) -0.3% ( -6% - 6%)
Respell 64.46 (2.5%) 64.36 (2.6%) -0.2% ( -5% - 5%)
OrHighNotMed 685.98 (4.5%) 685.69 (3.7%) -0.0% ( -7% - 8%)
OrHighMed 67.90 (5.1%) 67.91 (3.4%) 0.0% ( -8% - 8%)
Fuzzy2 50.18 (4.5%) 50.21 (5.8%) 0.1% ( -9% - 10%)
LowSpanNear 59.27 (3.9%) 59.34 (4.0%) 0.1% ( -7% - 8%)
OrHighHigh 30.89 (5.2%) 30.94 (3.3%) 0.2% ( -7% - 9%)
LowPhrase 114.67 (3.1%) 114.87 (2.5%) 0.2% ( -5% - 5%)
HighPhrase 22.34 (2.7%) 22.42 (2.2%) 0.4% ( -4% - 5%)
AndHighHigh 59.53 (3.8%) 59.89 (4.4%) 0.6% ( -7% - 9%)
MedPhrase 29.99 (2.9%) 30.19 (2.3%) 0.7% ( -4% - 6%)
MedSloppyPhrase 71.57 (3.1%) 72.10 (3.0%) 0.7% ( -5% - 7%)
IntNRQConjHighTerm 113.74 (7.3%) 114.66 (7.1%) 0.8% ( -12% - 16%)
LowSloppyPhrase 14.18 (3.4%) 14.30 (2.6%) 0.8% ( -4% - 6%)
PrefixConjLowTerm 89.05 (4.6%) 89.80 (5.1%) 0.8% ( -8% - 11%)
AndHighMed 166.34 (3.1%) 167.76 (3.8%) 0.9% ( -5% - 7%)
WildcardConjMedTerm 51.44 (2.6%) 51.88 (3.0%) 0.9% ( -4% - 6%)
PrefixConjMedTerm 68.16 (4.8%) 68.80 (4.6%) 0.9% ( -8% - 10%)
PrefixConjHighTerm 42.34 (6.1%) 42.81 (5.0%) 1.1% ( -9% - 13%)
MedSpanNear 15.57 (5.5%) 15.74 (5.4%) 1.1% ( -9% - 12%)
WildcardConjLowTerm 51.56 (3.7%) 52.15 (4.2%) 1.1% ( -6% - 9%)
HighSpanNear 5.66 (5.8%) 5.73 (5.9%) 1.2% ( -9% - 13%)
IntNRQConjLowTerm 120.28 (8.5%) 121.67 (8.8%) 1.2% ( -14% - 20%)
WildcardConjHighTerm 55.43 (3.2%) 56.10 (3.4%) 1.2% ( -5% - 8%)
IntNRQConjMedTerm 97.79 (8.3%) 98.98 (8.6%) 1.2% ( -14% - 19%)
Wildcard 106.37 (2.9%) 107.75 (3.6%) 1.3% ( -5% - 7%)
HighSloppyPhrase 18.21 (4.9%) 18.48 (4.4%) 1.5% ( -7% - 11%)
HighTermMonthSort 146.10 (11.0%) 148.89 (10.5%) 1.9% ( -17% - 26%)
HighTermDayOfYearSort 68.62 (6.1%) 70.08 (3.9%) 2.1% ( -7% - 12%)
}}
I will next have a look at what Atri is suggesting.
> Use exponential search in IntArrayDocIdSet advance method
> ---------------------------------------------------------
>
> Key: LUCENE-8796
> URL: https://issues.apache.org/jira/browse/LUCENE-8796
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Luca Cavanna
> Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making its advance method use exponential search instead of binary search. This should help performance of queries including conjunctions: given that ConjunctionDISI uses leap frog, it advances through doc ids in small steps, hence exponential search should be faster when advancing on average compared to binary search.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org