You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/30 01:16:47 UTC
[GitHub] [incubator-druid] gianm opened a new pull request #8194:
HllSketchMergeBufferAggregator: Speed up init by copying prebuilt sketch.
gianm opened a new pull request #8194: HllSketchMergeBufferAggregator: Speed up init by copying prebuilt sketch.
URL: https://github.com/apache/incubator-druid/pull/8194
Inspired by the following flame graph, this patch attacks `HllSketchMergeBufferAggregator.init` by precomputing a single initialized sketch, and then copying that precomputed sketch into aggregation buffers. Before, this was done by calling `Unsafe.setMemory` (by way of `Memory.clear`).
<img width="1655" alt="image" src="https://user-images.githubusercontent.com/1214075/62092710-0e042000-b22b-11e9-9385-54b6cd7d1de2.png">
One interesting thing about this flame graph is that `Memory.clear` accounts for nearly 60% (!) of it. Of that, half is in the HllSketch constructor, which this patch attacks.
The rest is mostly `Union.update`, in particular two methods it calls: `DirectCouponHashSet.growHashSet`, and `DirectCouponList.promoteListToSet`. @leerho or @AlexanderSaydakov - for these two methods - any idea what size of memory would typically be cleared at once? We could potentially speed these up by using copyMemory instead of setMemory with some pre-zeroed chunks.
JMH benchmarks:
```
master
Benchmark Mode Cnt Score Error Units
DataSketchesBenchmark.init thrpt 15 4788022.652 ± 109374.009 ops/s
DataSketchesBenchmark.initAndGet thrpt 15 4085628.980 ± 86160.850 ops/s
DataSketchesBenchmark.initAndSerde thrpt 15 2598419.066 ± 126836.466 ops/s
master + init change
Benchmark Mode Cnt Score Error Units
DataSketchesBenchmark.init thrpt 15 24943146.877 ± 140698.560 ops/s
DataSketchesBenchmark.initAndGet thrpt 15 12141905.292 ± 99566.648 ops/s
DataSketchesBenchmark.initAndSerde thrpt 15 4920001.015 ± 62927.010 ops/s
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org