You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Bruno Roustant (Jira)" <ji...@apache.org> on 2021/11/13 22:26:00 UTC

[jira] [Comment Edited] (LUCENE-10225) Improve IntroSelector with 3-way partitioning

    [ https://issues.apache.org/jira/browse/LUCENE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443195#comment-17443195 ] 

Bruno Roustant edited comment on LUCENE-10225 at 11/13/21, 10:25 PM:
---------------------------------------------------------------------

With the addition of the adaptive top-k algorithm triggered when k is close to from or last, we get significant perf gain (x3-x4) when (k - from <= 20) or (last - k <= 20):
{noformat}
RANDOM
  IntroSelector  ...  485  317  371  449  299  454  333  345  190  311
  IntroSelector2 ...   85   86   86   97   90   87   87   86   87   91
RANDOM_LOW_CARDINALITY
  IntroSelector  ...  475  483  239  445  234  236  455  460  365  508
  IntroSelector2 ...  113  114  115  114  112  113  114  114  113  114
RANDOM_MEDIUM_CARDINALITY
  IntroSelector  ...  529  287  448  374  347  392  356  412  182  408
  IntroSelector2 ...   88   85   86   87   87   87   86   86   88   85
ASCENDING
  IntroSelector  ...  157  150  148  146  146  147  146  147  146  146
  IntroSelector2 ...   80   80   82   82   82   83   81   82   81   82
DESCENDING
  IntroSelector  ...  250  250  246  246  254  251  255  247  247  249
  IntroSelector2 ...   84   84   84   85   83   82   80   81   81   82
STRICTLY_DESCENDING
  IntroSelector  ...  239  237  239  238  238  250  240  239  240  240
  IntroSelector2 ...   82   83   82   81   82   80   82   85   84   83
ASCENDING_SEQUENCES
  IntroSelector  ...  209  217  145  125  185  142  131  172  240  146
  IntroSelector2 ...   85   86   86   84   84   82   83   83   83   81
MOSTLY_ASCENDING
  IntroSelector  ...  151  155  150  150  154  147  150  154  154  154
  IntroSelector2 ...   82   82   81   81   81   81   82   82  104   85
{noformat}


was (Author: broustant):
With the addition of the adaptive top-k algorithm triggered when k is close to from or last, we get significant perf gain when (k - from <= 20) or (last - k <= 20):
{noformat}
RANDOM
  IntroSelector  ...  485  317  371  449  299  454  333  345  190  311
  IntroSelector2 ...   85   86   86   97   90   87   87   86   87   91
RANDOM_LOW_CARDINALITY
  IntroSelector  ...  475  483  239  445  234  236  455  460  365  508
  IntroSelector2 ...  113  114  115  114  112  113  114  114  113  114
RANDOM_MEDIUM_CARDINALITY
  IntroSelector  ...  529  287  448  374  347  392  356  412  182  408
  IntroSelector2 ...   88   85   86   87   87   87   86   86   88   85
ASCENDING
  IntroSelector  ...  157  150  148  146  146  147  146  147  146  146
  IntroSelector2 ...   80   80   82   82   82   83   81   82   81   82
DESCENDING
  IntroSelector  ...  250  250  246  246  254  251  255  247  247  249
  IntroSelector2 ...   84   84   84   85   83   82   80   81   81   82
STRICTLY_DESCENDING
  IntroSelector  ...  239  237  239  238  238  250  240  239  240  240
  IntroSelector2 ...   82   83   82   81   82   80   82   85   84   83
ASCENDING_SEQUENCES
  IntroSelector  ...  209  217  145  125  185  142  131  172  240  146
  IntroSelector2 ...   85   86   86   84   84   82   83   83   83   81
MOSTLY_ASCENDING
  IntroSelector  ...  151  155  150  150  154  147  150  154  154  154
  IntroSelector2 ...   82   82   81   81   81   81   82   82  104   85
{noformat}

> Improve IntroSelector with 3-way partitioning
> ---------------------------------------------
>
>                 Key: LUCENE-10225
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10225
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Priority: Major
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The same way we improved IntroSorter, we can improve IntroSelector with Bentley-McIlroy 3-way partitioning. The gain in performance is roughly the same.
> With this new approach, we always use medians-of-medians (Tukey's Ninther), so there is no real gain to fallback to a slower medians-of-medians technique as an introspective protection (like the existing implementation does). Instead we can simply shuffle the sub-range if we exceed the recursive max depth (this does not change the speed as this intro-protective mechanism almost never happens - maybe only in adversarial cases).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org