You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Bruno Roustant (Jira)" <ji...@apache.org> on 2021/11/13 22:26:00 UTC
[jira] [Comment Edited] (LUCENE-10225) Improve IntroSelector with 3-way partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443195#comment-17443195 ]
Bruno Roustant edited comment on LUCENE-10225 at 11/13/21, 10:25 PM:
---------------------------------------------------------------------
With the addition of the adaptive top-k algorithm triggered when k is close to from or last, we get significant perf gain (x3-x4) when (k - from <= 20) or (last - k <= 20):
{noformat}
RANDOM
IntroSelector ... 485 317 371 449 299 454 333 345 190 311
IntroSelector2 ... 85 86 86 97 90 87 87 86 87 91
RANDOM_LOW_CARDINALITY
IntroSelector ... 475 483 239 445 234 236 455 460 365 508
IntroSelector2 ... 113 114 115 114 112 113 114 114 113 114
RANDOM_MEDIUM_CARDINALITY
IntroSelector ... 529 287 448 374 347 392 356 412 182 408
IntroSelector2 ... 88 85 86 87 87 87 86 86 88 85
ASCENDING
IntroSelector ... 157 150 148 146 146 147 146 147 146 146
IntroSelector2 ... 80 80 82 82 82 83 81 82 81 82
DESCENDING
IntroSelector ... 250 250 246 246 254 251 255 247 247 249
IntroSelector2 ... 84 84 84 85 83 82 80 81 81 82
STRICTLY_DESCENDING
IntroSelector ... 239 237 239 238 238 250 240 239 240 240
IntroSelector2 ... 82 83 82 81 82 80 82 85 84 83
ASCENDING_SEQUENCES
IntroSelector ... 209 217 145 125 185 142 131 172 240 146
IntroSelector2 ... 85 86 86 84 84 82 83 83 83 81
MOSTLY_ASCENDING
IntroSelector ... 151 155 150 150 154 147 150 154 154 154
IntroSelector2 ... 82 82 81 81 81 81 82 82 104 85
{noformat}
was (Author: broustant):
With the addition of the adaptive top-k algorithm triggered when k is close to from or last, we get significant perf gain when (k - from <= 20) or (last - k <= 20):
{noformat}
RANDOM
IntroSelector ... 485 317 371 449 299 454 333 345 190 311
IntroSelector2 ... 85 86 86 97 90 87 87 86 87 91
RANDOM_LOW_CARDINALITY
IntroSelector ... 475 483 239 445 234 236 455 460 365 508
IntroSelector2 ... 113 114 115 114 112 113 114 114 113 114
RANDOM_MEDIUM_CARDINALITY
IntroSelector ... 529 287 448 374 347 392 356 412 182 408
IntroSelector2 ... 88 85 86 87 87 87 86 86 88 85
ASCENDING
IntroSelector ... 157 150 148 146 146 147 146 147 146 146
IntroSelector2 ... 80 80 82 82 82 83 81 82 81 82
DESCENDING
IntroSelector ... 250 250 246 246 254 251 255 247 247 249
IntroSelector2 ... 84 84 84 85 83 82 80 81 81 82
STRICTLY_DESCENDING
IntroSelector ... 239 237 239 238 238 250 240 239 240 240
IntroSelector2 ... 82 83 82 81 82 80 82 85 84 83
ASCENDING_SEQUENCES
IntroSelector ... 209 217 145 125 185 142 131 172 240 146
IntroSelector2 ... 85 86 86 84 84 82 83 83 83 81
MOSTLY_ASCENDING
IntroSelector ... 151 155 150 150 154 147 150 154 154 154
IntroSelector2 ... 82 82 81 81 81 81 82 82 104 85
{noformat}
> Improve IntroSelector with 3-way partitioning
> ---------------------------------------------
>
> Key: LUCENE-10225
> URL: https://issues.apache.org/jira/browse/LUCENE-10225
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Bruno Roustant
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The same way we improved IntroSorter, we can improve IntroSelector with Bentley-McIlroy 3-way partitioning. The gain in performance is roughly the same.
> With this new approach, we always use medians-of-medians (Tukey's Ninther), so there is no real gain to fallback to a slower medians-of-medians technique as an introspective protection (like the existing implementation does). Instead we can simply shuffle the sub-range if we exceed the recursive max depth (this does not change the speed as this intro-protective mechanism almost never happens - maybe only in adversarial cases).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org