You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/27 08:43:31 UTC

[GitHub] [druid] clintropolis opened a new pull request #11039: improve bitmap vector offset to report contiguous groups

clintropolis opened a new pull request #11039:
URL: https://github.com/apache/druid/pull/11039


   ### Description
   This PR is a follow-up to #11004, which improves `BitmapVectorOffset` to report the current set of offsets as "contiguous" when they are, in order to take advantage of any optimizations available when processing contiguous vectorized gets. 
   
   In #11004, I noticed the improvements to vectorized decoding of long values in my column scan benchmarks only seemed to impact a full scan. Looking closer, it was due to how the benchmark uses a `BitmapVectorOffset` to simulate filtering rows, which prior to this pr would _never_ report the current set off offsets as contiguous, even if it was. Since #11004 primarily improved performance of contiguous vectorized gets, when using a `BitmapVectorOffset` we were missing out on these improvements entirely.
   
   To help measure the improvements of allowing contiguous blocks to be processed with the contiguous get methods, I improved the benchmarks added in #11004 to now have several different filter "distributions" to simulate different filtering patterns, instead of previously only using a random distribution. These include:
   - random: `BitmapVectorOffset`, bitmap is set randomly for desired number of rows
   - contiguous-start: `NoFilterVectorOffset` from starting offset 0 to desired number of rows
   - contiguous-end: opposite end of column as 'contiguous-start'
   - contiguous-bitmap-start: `BitmapVectorOffset`, bitmap set from 0 to desired number of rows
   - contiguous-bitmap-end: opposite end of column as `contiguous-bitmap-start`
   - chunky-1000: `BitmapVectorOffset`, position chosen randomly and up to 1000 contiguous values set until chosen number of rows is set
   - chunky-10000: 'chunky-1000' but with chunks of 10k contiguous values
   
   The results of this change is a pretty decent improvement to column read speed:
   
   ![before-after-bitmaps](https://user-images.githubusercontent.com/1577461/112715353-edc3b300-8e9c-11eb-9f09-6f179d8177ff.gif)
   
   
   before:
   ```
   Benchmark                                                                        (distribution)  (encoding)     (filterDistribution)  (filteredRowCountPercentage)   (rows)  (zeroProbability)  Mode  Cnt        Score      Error  Units
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.1  5000000                0.0  avgt   10     3096.136 ±   20.732  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                          0.25  5000000                0.0  avgt   10     8142.492 ±  124.247  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.5  5000000                0.0  avgt   10    16204.635 ±  101.374  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.6  5000000                0.0  avgt   10    19377.004 ±  109.489  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                          0.75  5000000                0.0  avgt   10    25621.704 ± 1544.863  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.8  5000000                0.0  avgt   10    26393.053 ±  190.878  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.9  5000000                0.0  avgt   10    29200.991 ±  493.996  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                          0.95  5000000                0.0  avgt   10    30702.206 ±  198.864  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           1.0  5000000                0.0  avgt   10    17997.131 ±  127.793  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.1  5000000                0.0  avgt   10     3881.268 ±   19.018  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                          0.25  5000000                0.0  avgt   10     8097.172 ±  284.560  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.5  5000000                0.0  avgt   10    14498.809 ±   77.822  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.6  5000000                0.0  avgt   10    18387.397 ±   96.425  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                          0.75  5000000                0.0  avgt   10    21986.742 ±   89.077  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.8  5000000                0.0  avgt   10    23239.459 ±  290.921  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.9  5000000                0.0  avgt   10    25199.260 ±  162.413  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                          0.95  5000000                0.0  avgt   10    27449.210 ± 1688.145  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           1.0  5000000                0.0  avgt   10    17981.037 ±   63.957  us/op
   ```
   
   
   after:
   ```
   Benchmark                                                                        (distribution)  (encoding)     (filterDistribution)  (filteredRowCountPercentage)   (rows)  (zeroProbability)  Mode  Cnt        Score     Error  Units
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.1  5000000                0.0  avgt   10     2396.812 ±  59.423  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                          0.25  5000000                0.0  avgt   10     5941.108 ± 206.432  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.5  5000000                0.0  avgt   10    10904.502 ± 248.979  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.6  5000000                0.0  avgt   10    13237.317 ± 175.817  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                          0.75  5000000                0.0  avgt   10    17216.386 ± 934.780  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.8  5000000                0.0  avgt   10    17088.263 ± 116.899  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           0.9  5000000                0.0  avgt   10    20996.326 ±  81.832  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                          0.95  5000000                0.0  avgt   10    20735.792 ±  71.165  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto  contiguous-bitmap-start                           1.0  5000000                0.0  avgt   10    17992.744 ±  87.609  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.1  5000000                0.0  avgt   10     3226.460 ±  16.228  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                          0.25  5000000                0.0  avgt   10     6771.234 ± 127.713  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.5  5000000                0.0  avgt   10    12587.962 ±  50.479  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.6  5000000                0.0  avgt   10    15874.411 ±  72.437  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                          0.75  5000000                0.0  avgt   10    18694.264 ±  34.897  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.8  5000000                0.0  avgt   10    20658.982 ± 214.946  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           0.9  5000000                0.0  avgt   10    21208.258 ±  73.285  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                          0.95  5000000                0.0  avgt   10    23644.473 ± 968.966  us/op
   ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized                   uniform-12    lz4-auto              chunky-1000                           1.0  5000000                0.0  avgt   10    17980.515 ±  95.417  us/op
   ```
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11039: improve bitmap vector offset to report contiguous groups

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11039:
URL: https://github.com/apache/druid/pull/11039#discussion_r610300699



##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -114,6 +124,9 @@ public int getCurrentVectorSize()
   @Override
   public int getStartOffset()
   {
+    if (isContiguous) {

Review comment:
       added, and tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11039: improve bitmap vector offset to report contiguous groups

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11039:
URL: https://github.com/apache/druid/pull/11039#discussion_r610288652



##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -85,6 +87,14 @@ public void advance()
 
       currentVectorSize = to;
     }
+
+    if (currentVectorSize > 1) {

Review comment:
       Why not do this branch if `currentVectorSize == 1`?

##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -114,6 +124,9 @@ public int getCurrentVectorSize()
   @Override
   public int getStartOffset()
   {
+    if (isContiguous) {

Review comment:
       According to the interface, `getOffsets` is supposed to throw an exception if isContiguous is true, so we should check it there too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11039: improve bitmap vector offset to report contiguous groups

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11039:
URL: https://github.com/apache/druid/pull/11039#discussion_r610329098



##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -85,6 +87,14 @@ public void advance()
 
       currentVectorSize = to;
     }
+
+    if (currentVectorSize > 1) {

Review comment:
       I guess it doesn't really matter. I thought I might have been missing some reason why it wasn't ok to do it for the `currentVectorSize == 1` case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11039: improve bitmap vector offset to report contiguous groups

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11039:
URL: https://github.com/apache/druid/pull/11039#discussion_r610297246



##########
File path: processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java
##########
@@ -85,6 +87,14 @@ public void advance()
 
       currentVectorSize = to;
     }
+
+    if (currentVectorSize > 1) {

Review comment:
       we could, and I considered it, but ultimately I decided that 1 didn't really "feel" like it met the threshold for me to consider it really contiguous. I also figured it didn't really make a lot of difference between which read method it would end up in because there is only 1 offset either way. But I don't have strong opinions on it though since either seems valid, do you think it logically should be considered contiguous? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm merged pull request #11039: improve bitmap vector offset to report contiguous groups

Posted by GitBox <gi...@apache.org>.
gianm merged pull request #11039:
URL: https://github.com/apache/druid/pull/11039


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org