You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Luca Cavanna (JIRA)" <ji...@apache.org> on 2019/05/08 12:19:00 UTC
[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

    [ https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835542#comment-16835542 ] 

Luca Cavanna commented on LUCENE-8796:
--------------------------------------

I have made the change and played with luceneutil to run some benchmark. I opened a PR here: https://github.com/apache/lucene-solr/pull/667 .

Luceneutil does not currently benchmark the queries that should be affected by this change, hence I added benchmarks for numeric range queries, prefix queries and wildcard queries in conjunction with term queries (low, medium and high frequency). See the changes I made to my luceneutil fork: [https://github.com/mikemccand/luceneutil/compare/master...javanna:conjunctions] .  Also, for the benchmarks I temporarily modified DocIdSetBuilder#grow to never call upgradeToBitSet (on both baseline and modified version), so that the updated code is exercised as much as possible during the benchmarks run, otherwise in many cases we would use bitsets instead and the changed code would not be exercised at all.

I ran the wikimedium10m benchmarks a few times, here is probably the run with the least noise, results show a little improvement for some queries, and no regressions in general:
 
Report after iter 19:
 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
 WildcardConjMedTerm 75.49 (2.2%) 72.79 (2.0%) -3.6% ( -7% - 0%)
 OrHighNotMed 607.01 (5.7%) 593.10 (4.4%) -2.3% ( -11% - 8%)
 WildcardConjHighTerm 64.00 (1.7%) 62.55 (1.4%) -2.3% ( -5% - 0%)
 Fuzzy2 20.14 (3.4%) 19.72 (4.6%) -2.1% ( -9% - 6%)
 HighTerm 1174.41 (4.7%) 1150.11 (4.2%) -2.1% ( -10% - 7%)
 OrHighLow 483.40 (5.1%) 473.69 (6.9%) -2.0% ( -13% - 10%)
 OrNotHighLow 526.75 (3.6%) 516.47 (3.6%) -2.0% ( -8% - 5%)
 OrNotHighHigh 600.38 (4.9%) 590.21 (3.7%) -1.7% ( -9% - 7%)
 HighTermMonthSort 110.05 (11.7%) 108.58 (11.5%) -1.3% ( -21% - 24%)
 OrHighMed 107.83 (2.6%) 106.48 (4.7%) -1.3% ( -8% - 6%)
 PrefixConjMedTerm 56.98 (2.5%) 56.33 (1.7%) -1.1% ( -5% - 3%)
 AndHighLow 432.27 (3.6%) 427.46 (3.2%) -1.1% ( -7% - 5%)
 PrefixConjLowTerm 44.43 (2.8%) 43.98 (1.8%) -1.0% ( -5% - 3%)
 MedTerm 1409.97 (5.5%) 1396.33 (4.9%) -1.0% ( -10% - 9%)
 HighSloppyPhrase 11.98 (4.3%) 11.87 (5.1%) -0.9% ( -9% - 8%)
 OrNotHighMed 614.19 (4.6%) 608.74 (3.8%) -0.9% ( -8% - 7%)
 Respell 58.11 (2.4%) 57.61 (2.4%) -0.9% ( -5% - 3%)
 LowTerm 1342.33 (4.8%) 1330.86 (4.0%) -0.9% ( -9% - 8%)
 PrefixConjHighTerm 68.50 (2.9%) 67.93 (1.8%) -0.8% ( -5% - 3%)
 OrHighNotHigh 566.30 (5.2%) 561.88 (4.5%) -0.8% ( -9% - 9%)
 WildcardConjLowTerm 32.75 (2.5%) 32.56 (2.1%) -0.6% ( -5% - 4%)
 PKLookup 131.80 (2.4%) 131.28 (2.3%) -0.4% ( -5% - 4%)
 OrHighHigh 29.90 (3.4%) 29.79 (5.3%) -0.4% ( -8% - 8%)
 OrHighNotLow 497.65 (6.6%) 495.84 (5.2%) -0.4% ( -11% - 12%)
 AndHighMed 175.08 (3.5%) 174.58 (3.0%) -0.3% ( -6% - 6%)
 LowSpanNear 15.17 (1.8%) 15.13 (2.5%) -0.2% ( -4% - 4%)
 Fuzzy1 71.14 (5.9%) 70.97 (6.3%) -0.2% ( -11% - 12%)
 LowSloppyPhrase 35.23 (2.0%) 35.16 (2.6%) -0.2% ( -4% - 4%)
 LowPhrase 74.10 (1.7%) 73.98 (1.8%) -0.2% ( -3% - 3%)
 HighPhrase 34.18 (2.1%) 34.13 (2.0%) -0.1% ( -4% - 3%)
 Prefix3 45.33 (2.3%) 45.28 (2.1%) -0.1% ( -4% - 4%)
 MedPhrase 28.30 (2.1%) 28.27 (1.7%) -0.1% ( -3% - 3%)
 MedSloppyPhrase 6.80 (3.6%) 6.80 (3.2%) -0.0% ( -6% - 6%)
 AndHighHigh 53.79 (3.9%) 53.79 (4.0%) -0.0% ( -7% - 8%)
 MedSpanNear 61.78 (2.2%) 61.83 (1.7%) 0.1% ( -3% - 4%)
 Wildcard 37.83 (2.5%) 37.91 (1.7%) 0.2% ( -3% - 4%)
 IntNRQConjHighTerm 20.17 (3.8%) 20.24 (4.9%) 0.3% ( -8% - 9%)
 HighTermDayOfYearSort 53.55 (7.8%) 53.76 (7.3%) 0.4% ( -13% - 16%)
 HighSpanNear 5.39 (2.6%) 5.42 (2.6%) 0.5% ( -4% - 5%)
 IntNRQConjLowTerm 19.69 (4.3%) 19.86 (4.3%) 0.9% ( -7% - 9%)
 IntNRQConjMedTerm 15.93 (4.5%) 16.12 (5.4%) 1.2% ( -8% - 11%)
 IntNRQ 114.28 (10.3%) 116.41 (14.0%) 1.9% ( -20% - 29%)

 

 

> Use exponential search in IntArrayDocIdSet advance method
> ---------------------------------------------------------
>
>                 Key: LUCENE-8796
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8796
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Luca Cavanna
>            Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making its advance method use exponential search instead of binary search. This should help performance of queries including conjunctions: given that ConjunctionDISI uses leap frog, it advances through doc ids in small steps, hence exponential search should be faster when advancing on average compared to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org