You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/12/21 22:41:58 UTC

[GitHub] richardstartin opened a new pull request #6770: Consider using Roaring batch iterators

richardstartin opened a new pull request #6770: Consider using Roaring batch iterators
URL: https://github.com/apache/incubator-druid/pull/6770
 
 
   Here is a patch which replaces the `IntIterator` implementation with one that wraps a `BatchIterator` - an iterator which extracts values from the bitmap into a buffer in batches, and can be a lot faster. This relies on a space time tradeoff, so probably requires heuristics or a configurable cardinality threshold to choose when the new kind of iterator is used. These iterators are also more expensive to clone, which seems to be done extensively in druid.
   
   At b5df73ebf4598d8d2eff3daf7102a37adaa3abd4 (with batch iterators) I get
   
   ```
   Benchmark                      (bitmapAlgo)  (prob)   (size)  Mode  Cnt        Score        Error  Units
   BitmapIterationBenchmark.iter        bitset     0.0  1000000  avgt    5        2.785 ±      0.122  ns/op
   BitmapIterationBenchmark.iter        bitset   0.001  1000000  avgt    5     8496.144 ±    887.048  ns/op
   BitmapIterationBenchmark.iter        bitset     0.1  1000000  avgt    5   481792.250 ±  10840.026  ns/op
   BitmapIterationBenchmark.iter        bitset     0.5  1000000  avgt    5  1910323.042 ±  49924.335  ns/op
   BitmapIterationBenchmark.iter        bitset    0.99  1000000  avgt    5  3680187.725 ±  61100.955  ns/op
   BitmapIterationBenchmark.iter        bitset     1.0  1000000  avgt    5  3732561.471 ±  54140.556  ns/op
   BitmapIterationBenchmark.iter       concise     0.0  1000000  avgt    5        2.781 ±      0.060  ns/op
   BitmapIterationBenchmark.iter       concise   0.001  1000000  avgt    5     5615.190 ±     75.432  ns/op
   BitmapIterationBenchmark.iter       concise     0.1  1000000  avgt    5  2361254.431 ± 294409.483  ns/op
   BitmapIterationBenchmark.iter       concise     0.5  1000000  avgt    5  5561685.586 ±  95190.501  ns/op
   BitmapIterationBenchmark.iter       concise    0.99  1000000  avgt    5  2097075.870 ±  63050.075  ns/op
   BitmapIterationBenchmark.iter       concise     1.0  1000000  avgt    5  2288373.696 ±  90706.616  ns/op
   BitmapIterationBenchmark.iter       roaring     0.0  1000000  avgt    5       78.431 ±      1.114  ns/op
   BitmapIterationBenchmark.iter       roaring   0.001  1000000  avgt    5     3355.265 ±     50.972  ns/op
   BitmapIterationBenchmark.iter       roaring     0.1  1000000  avgt    5   369259.712 ±   9960.458  ns/op
   BitmapIterationBenchmark.iter       roaring     0.5  1000000  avgt    5  1098874.194 ± 702851.369  ns/op
   BitmapIterationBenchmark.iter       roaring    0.99  1000000  avgt    5  2298007.861 ±  33151.084  ns/op
   BitmapIterationBenchmark.iter       roaring     1.0  1000000  avgt    5  2291253.405 ±  31165.949  ns/op
   ```
   
   At 7a09cde4de1953eee75c5033e863cfde8f94d6c1 (without batch iterators) I get:
   ```
   Benchmark                      (bitmapAlgo)  (prob)   (size)  Mode  Cnt        Score        Error  Units
   BitmapIterationBenchmark.iter        bitset     0.0  1000000  avgt    5        2.804 ±      0.054  ns/op
   BitmapIterationBenchmark.iter        bitset   0.001  1000000  avgt    5     8584.675 ±    149.012  ns/op
   BitmapIterationBenchmark.iter        bitset     0.1  1000000  avgt    5   477834.206 ±  16173.440  ns/op
   BitmapIterationBenchmark.iter        bitset     0.5  1000000  avgt    5  1916687.564 ±  66723.789  ns/op
   BitmapIterationBenchmark.iter        bitset    0.99  1000000  avgt    5  3661377.107 ± 112077.640  ns/op
   BitmapIterationBenchmark.iter        bitset     1.0  1000000  avgt    5  3997551.794 ±  88884.369  ns/op
   BitmapIterationBenchmark.iter       concise     0.0  1000000  avgt    5        3.074 ±      0.818  ns/op
   BitmapIterationBenchmark.iter       concise   0.001  1000000  avgt    5     6249.486 ±    521.458  ns/op
   BitmapIterationBenchmark.iter       concise     0.1  1000000  avgt    5  2331668.373 ±  62614.776  ns/op
   BitmapIterationBenchmark.iter       concise     0.5  1000000  avgt    5  5621965.934 ± 105912.792  ns/op
   BitmapIterationBenchmark.iter       concise    0.99  1000000  avgt    5  2078971.999 ±  61276.180  ns/op
   BitmapIterationBenchmark.iter       concise     1.0  1000000  avgt    5  2310745.781 ± 108732.741  ns/op
   BitmapIterationBenchmark.iter       roaring     0.0  1000000  avgt    5        5.654 ±      0.318  ns/op
   BitmapIterationBenchmark.iter       roaring   0.001  1000000  avgt    5     4211.167 ±    105.185  ns/op
   BitmapIterationBenchmark.iter       roaring     0.1  1000000  avgt    5   431989.663 ±  18202.174  ns/op
   BitmapIterationBenchmark.iter       roaring     0.5  1000000  avgt    5  1403242.519 ±  29345.563  ns/op
   BitmapIterationBenchmark.iter       roaring    0.99  1000000  avgt    5  2569253.542 ±  29920.673  ns/op
   BitmapIterationBenchmark.iter       roaring     1.0  1000000  avgt    5  2639932.161 ±  61587.416  ns/op
   ```
   
   Note that using batch iterators via the `IntIterator` interface costs at least 2x throughput compared to using them via the `BatchIterator` interface.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org