You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/27 08:43:31 UTC
[GitHub] [druid] clintropolis opened a new pull request #11039: improve bitmap vector offset to report contiguous groups
clintropolis opened a new pull request #11039:
URL: https://github.com/apache/druid/pull/11039
### Description
This PR is a follow-up to #11004, which improves `BitmapVectorOffset` to report the current set of offsets as "contiguous" when they are, in order to take advantage of any optimizations available when processing contiguous vectorized gets.
In #11004, I noticed the improvements to vectorized decoding of long values in my column scan benchmarks only seemed to impact a full scan. Looking closer, it was due to how the benchmark uses a `BitmapVectorOffset` to simulate filtering rows, which prior to this pr would _never_ report the current set off offsets as contiguous, even if it was. Since #11004 primarily improved performance of contiguous vectorized gets, when using a `BitmapVectorOffset` we were missing out on these improvements entirely.
To help measure the improvements of allowing contiguous blocks to be processed with the contiguous get methods, I improved the benchmarks added in #11004 to now have several different filter "distributions" to simulate different filtering patterns, instead of previously only using a random distribution. These include:
- random: `BitmapVectorOffset`, bitmap is set randomly for desired number of rows
- contiguous-start: `NoFilterVectorOffset` from starting offset 0 to desired number of rows
- contiguous-end: opposite end of column as 'contiguous-start'
- contiguous-bitmap-start: `BitmapVectorOffset`, bitmap set from 0 to desired number of rows
- contiguous-bitmap-end: opposite end of column as `contiguous-bitmap-start`
- chunky-1000: `BitmapVectorOffset`, position chosen randomly and up to 1000 contiguous values set until chosen number of rows is set
- chunky-10000: 'chunky-1000' but with chunks of 10k contiguous values
The results of this change is a pretty decent improvement to column read speed:
![before-after-bitmaps](https://user-images.githubusercontent.com/1577461/112715353-edc3b300-8e9c-11eb-9f09-6f179d8177ff.gif)
before:
```
Benchmark (distribution) (encoding) (filterDistribution) (filteredRowCountPercentage) (rows) (zeroProbability) Mode Cnt Score Error Units
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.1 5000000 0.0 avgt 10 3096.136 ± 20.732 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.25 5000000 0.0 avgt 10 8142.492 ± 124.247 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.5 5000000 0.0 avgt 10 16204.635 ± 101.374 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.6 5000000 0.0 avgt 10 19377.004 ± 109.489 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.75 5000000 0.0 avgt 10 25621.704 ± 1544.863 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.8 5000000 0.0 avgt 10 26393.053 ± 190.878 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.9 5000000 0.0 avgt 10 29200.991 ± 493.996 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.95 5000000 0.0 avgt 10 30702.206 ± 198.864 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 1.0 5000000 0.0 avgt 10 17997.131 ± 127.793 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.1 5000000 0.0 avgt 10 3881.268 ± 19.018 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.25 5000000 0.0 avgt 10 8097.172 ± 284.560 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.5 5000000 0.0 avgt 10 14498.809 ± 77.822 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.6 5000000 0.0 avgt 10 18387.397 ± 96.425 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.75 5000000 0.0 avgt 10 21986.742 ± 89.077 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.8 5000000 0.0 avgt 10 23239.459 ± 290.921 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.9 5000000 0.0 avgt 10 25199.260 ± 162.413 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.95 5000000 0.0 avgt 10 27449.210 ± 1688.145 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 1.0 5000000 0.0 avgt 10 17981.037 ± 63.957 us/op
```
after:
```
Benchmark (distribution) (encoding) (filterDistribution) (filteredRowCountPercentage) (rows) (zeroProbability) Mode Cnt Score Error Units
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.1 5000000 0.0 avgt 10 2396.812 ± 59.423 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.25 5000000 0.0 avgt 10 5941.108 ± 206.432 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.5 5000000 0.0 avgt 10 10904.502 ± 248.979 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.6 5000000 0.0 avgt 10 13237.317 ± 175.817 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.75 5000000 0.0 avgt 10 17216.386 ± 934.780 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.8 5000000 0.0 avgt 10 17088.263 ± 116.899 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.9 5000000 0.0 avgt 10 20996.326 ± 81.832 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 0.95 5000000 0.0 avgt 10 20735.792 ± 71.165 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto contiguous-bitmap-start 1.0 5000000 0.0 avgt 10 17992.744 ± 87.609 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.1 5000000 0.0 avgt 10 3226.460 ± 16.228 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.25 5000000 0.0 avgt 10 6771.234 ± 127.713 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.5 5000000 0.0 avgt 10 12587.962 ± 50.479 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.6 5000000 0.0 avgt 10 15874.411 ± 72.437 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.75 5000000 0.0 avgt 10 18694.264 ± 34.897 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.8 5000000 0.0 avgt 10 20658.982 ± 214.946 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.9 5000000 0.0 avgt 10 21208.258 ± 73.285 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 0.95 5000000 0.0 avgt 10 23644.473 ± 968.966 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized uniform-12 lz4-auto chunky-1000 1.0 5000000 0.0 avgt 10 17980.515 ± 95.417 us/op
```
This PR has:
- [x] been self-reviewed.
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #11039: improve bitmap vector offset to report contiguous groups
Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11039:
URL: https://github.com/apache/druid/pull/11039#discussion_r610300699
##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -114,6 +124,9 @@ public int getCurrentVectorSize()
@Override
public int getStartOffset()
{
+ if (isContiguous) {
Review comment:
added, and tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #11039: improve bitmap vector offset to report contiguous groups
Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11039:
URL: https://github.com/apache/druid/pull/11039#discussion_r610288652
##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -85,6 +87,14 @@ public void advance()
currentVectorSize = to;
}
+
+ if (currentVectorSize > 1) {
Review comment:
Why not do this branch if `currentVectorSize == 1`?
##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -114,6 +124,9 @@ public int getCurrentVectorSize()
@Override
public int getStartOffset()
{
+ if (isContiguous) {
Review comment:
According to the interface, `getOffsets` is supposed to throw an exception if isContiguous is true, so we should check it there too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #11039: improve bitmap vector offset to report contiguous groups
Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11039:
URL: https://github.com/apache/druid/pull/11039#discussion_r610329098
##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -85,6 +87,14 @@ public void advance()
currentVectorSize = to;
}
+
+ if (currentVectorSize > 1) {
Review comment:
I guess it doesn't really matter. I thought I might have been missing some reason why it wasn't ok to do it for the `currentVectorSize == 1` case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #11039: improve bitmap vector offset to report contiguous groups
Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11039:
URL: https://github.com/apache/druid/pull/11039#discussion_r610297246
##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -85,6 +87,14 @@ public void advance()
currentVectorSize = to;
}
+
+ if (currentVectorSize > 1) {
Review comment:
we could, and I considered it, but ultimately I decided that 1 didn't really "feel" like it met the threshold for me to consider it really contiguous. I also figured it didn't really make a lot of difference between which read method it would end up in because there is only 1 offset either way. But I don't have strong opinions on it though since either seems valid, do you think it logically should be considered contiguous?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm merged pull request #11039: improve bitmap vector offset to report contiguous groups
Posted by GitBox <gi...@apache.org>.
gianm merged pull request #11039:
URL: https://github.com/apache/druid/pull/11039
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org