You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/30 01:16:47 UTC

[GitHub] [incubator-druid] gianm opened a new pull request #8194: HllSketchMergeBufferAggregator: Speed up init by copying prebuilt sketch.

gianm opened a new pull request #8194: HllSketchMergeBufferAggregator: Speed up init by copying prebuilt sketch.
URL: https://github.com/apache/incubator-druid/pull/8194
 
 
   Inspired by the following flame graph, this patch attacks `HllSketchMergeBufferAggregator.init` by precomputing a single initialized sketch, and then copying that precomputed sketch into aggregation buffers. Before, this was done by calling `Unsafe.setMemory` (by way of `Memory.clear`).
   
   <img width="1655" alt="image" src="https://user-images.githubusercontent.com/1214075/62092710-0e042000-b22b-11e9-9385-54b6cd7d1de2.png">
   
   One interesting thing about this flame graph is that `Memory.clear` accounts for nearly 60% (!) of it. Of that, half is in the HllSketch constructor, which this patch attacks.
   
   The rest is mostly `Union.update`, in particular two methods it calls: `DirectCouponHashSet.growHashSet`, and `DirectCouponList.promoteListToSet`. @leerho or @AlexanderSaydakov - for these two methods - any idea what size of memory would typically be cleared at once? We could potentially speed these up by using copyMemory instead of setMemory with some pre-zeroed chunks.
   
   JMH benchmarks:
   
   ```
   master
   
   Benchmark                            Mode  Cnt         Score        Error  Units
   DataSketchesBenchmark.init          thrpt   15   4788022.652 ± 109374.009  ops/s
   DataSketchesBenchmark.initAndGet    thrpt   15   4085628.980 ±  86160.850  ops/s
   DataSketchesBenchmark.initAndSerde  thrpt   15   2598419.066 ± 126836.466  ops/s
   
   master + init change
   
   Benchmark                            Mode  Cnt         Score        Error  Units
   DataSketchesBenchmark.init          thrpt   15  24943146.877 ± 140698.560  ops/s
   DataSketchesBenchmark.initAndGet    thrpt   15  12141905.292 ±  99566.648  ops/s
   DataSketchesBenchmark.initAndSerde  thrpt   15   4920001.015 ±  62927.010  ops/s
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org