You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/12/21 22:41:58 UTC
[GitHub] richardstartin opened a new pull request #6770: Consider using
Roaring batch iterators
richardstartin opened a new pull request #6770: Consider using Roaring batch iterators
URL: https://github.com/apache/incubator-druid/pull/6770
Here is a patch which replaces the `IntIterator` implementation with one that wraps a `BatchIterator` - an iterator which extracts values from the bitmap into a buffer in batches, and can be a lot faster. This relies on a space time tradeoff, so probably requires heuristics or a configurable cardinality threshold to choose when the new kind of iterator is used. These iterators are also more expensive to clone, which seems to be done extensively in druid.
At b5df73ebf4598d8d2eff3daf7102a37adaa3abd4 (with batch iterators) I get
```
Benchmark (bitmapAlgo) (prob) (size) Mode Cnt Score Error Units
BitmapIterationBenchmark.iter bitset 0.0 1000000 avgt 5 2.785 ± 0.122 ns/op
BitmapIterationBenchmark.iter bitset 0.001 1000000 avgt 5 8496.144 ± 887.048 ns/op
BitmapIterationBenchmark.iter bitset 0.1 1000000 avgt 5 481792.250 ± 10840.026 ns/op
BitmapIterationBenchmark.iter bitset 0.5 1000000 avgt 5 1910323.042 ± 49924.335 ns/op
BitmapIterationBenchmark.iter bitset 0.99 1000000 avgt 5 3680187.725 ± 61100.955 ns/op
BitmapIterationBenchmark.iter bitset 1.0 1000000 avgt 5 3732561.471 ± 54140.556 ns/op
BitmapIterationBenchmark.iter concise 0.0 1000000 avgt 5 2.781 ± 0.060 ns/op
BitmapIterationBenchmark.iter concise 0.001 1000000 avgt 5 5615.190 ± 75.432 ns/op
BitmapIterationBenchmark.iter concise 0.1 1000000 avgt 5 2361254.431 ± 294409.483 ns/op
BitmapIterationBenchmark.iter concise 0.5 1000000 avgt 5 5561685.586 ± 95190.501 ns/op
BitmapIterationBenchmark.iter concise 0.99 1000000 avgt 5 2097075.870 ± 63050.075 ns/op
BitmapIterationBenchmark.iter concise 1.0 1000000 avgt 5 2288373.696 ± 90706.616 ns/op
BitmapIterationBenchmark.iter roaring 0.0 1000000 avgt 5 78.431 ± 1.114 ns/op
BitmapIterationBenchmark.iter roaring 0.001 1000000 avgt 5 3355.265 ± 50.972 ns/op
BitmapIterationBenchmark.iter roaring 0.1 1000000 avgt 5 369259.712 ± 9960.458 ns/op
BitmapIterationBenchmark.iter roaring 0.5 1000000 avgt 5 1098874.194 ± 702851.369 ns/op
BitmapIterationBenchmark.iter roaring 0.99 1000000 avgt 5 2298007.861 ± 33151.084 ns/op
BitmapIterationBenchmark.iter roaring 1.0 1000000 avgt 5 2291253.405 ± 31165.949 ns/op
```
At 7a09cde4de1953eee75c5033e863cfde8f94d6c1 (without batch iterators) I get:
```
Benchmark (bitmapAlgo) (prob) (size) Mode Cnt Score Error Units
BitmapIterationBenchmark.iter bitset 0.0 1000000 avgt 5 2.804 ± 0.054 ns/op
BitmapIterationBenchmark.iter bitset 0.001 1000000 avgt 5 8584.675 ± 149.012 ns/op
BitmapIterationBenchmark.iter bitset 0.1 1000000 avgt 5 477834.206 ± 16173.440 ns/op
BitmapIterationBenchmark.iter bitset 0.5 1000000 avgt 5 1916687.564 ± 66723.789 ns/op
BitmapIterationBenchmark.iter bitset 0.99 1000000 avgt 5 3661377.107 ± 112077.640 ns/op
BitmapIterationBenchmark.iter bitset 1.0 1000000 avgt 5 3997551.794 ± 88884.369 ns/op
BitmapIterationBenchmark.iter concise 0.0 1000000 avgt 5 3.074 ± 0.818 ns/op
BitmapIterationBenchmark.iter concise 0.001 1000000 avgt 5 6249.486 ± 521.458 ns/op
BitmapIterationBenchmark.iter concise 0.1 1000000 avgt 5 2331668.373 ± 62614.776 ns/op
BitmapIterationBenchmark.iter concise 0.5 1000000 avgt 5 5621965.934 ± 105912.792 ns/op
BitmapIterationBenchmark.iter concise 0.99 1000000 avgt 5 2078971.999 ± 61276.180 ns/op
BitmapIterationBenchmark.iter concise 1.0 1000000 avgt 5 2310745.781 ± 108732.741 ns/op
BitmapIterationBenchmark.iter roaring 0.0 1000000 avgt 5 5.654 ± 0.318 ns/op
BitmapIterationBenchmark.iter roaring 0.001 1000000 avgt 5 4211.167 ± 105.185 ns/op
BitmapIterationBenchmark.iter roaring 0.1 1000000 avgt 5 431989.663 ± 18202.174 ns/op
BitmapIterationBenchmark.iter roaring 0.5 1000000 avgt 5 1403242.519 ± 29345.563 ns/op
BitmapIterationBenchmark.iter roaring 0.99 1000000 avgt 5 2569253.542 ± 29920.673 ns/op
BitmapIterationBenchmark.iter roaring 1.0 1000000 avgt 5 2639932.161 ± 61587.416 ns/op
```
Note that using batch iterators via the `IntIterator` interface costs at least 2x throughput compared to using them via the `BatchIterator` interface.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org