You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/08/19 13:16:07 UTC

[GitHub] [incubator-druid] leventov opened a new issue #8335: Skip `positions` indirection in `PooledTopNAlgorithm` when the aggregation size is small

leventov opened a new issue #8335: Skip `positions` indirection in `PooledTopNAlgorithm` when the aggregation size is small
URL: https://github.com/apache/incubator-druid/issues/8335
 
 
   `int[] positions` indirection in [`PooledTopNAlgorithm`](https://github.com/apache/incubator-druid/blob/566dc8c719489283f9190cefd6346bbb3f12955f/processing/src/main/java/org/apache/druid/query/topn/PooledTopNAlgorithm.java) seems wasteful, especially when the aggregation size itself is just 4 or 8 bytes, as is the case of float/double/long Min/Max/Sum aggregations, leading to 33%/50% higher memory usage than needed for processing. It's role to initialize the aggregation at the right moment 1) can be replaced with `BitSet dimIndexInitialized`; 2) may be unnecessary/wasteful itself for aggregators which zero the memory as their initialization step: it may be faster to just stream set the whole buffer's memory to zero at the beginning of processing.
   
   There is a locality concern for larger aggregations: `positions` facilitate putting the hottest aggregations together at the beginning of the buffer, thus improving the cache and the TLB utilization. This positive effect is completely canceled by the `positions` itself (access to which is still random) for aggregations of 4 bytes and almost for sure for aggregations of 8 bytes. After that, there should be experiments showing at which aggregation size the positive effect of `positions` outweigh its negative effect (which also diminishes with the growth of the aggregation size): its likely to be somewhere between 12 and 32 bytes, but benchmarking is required to determine the threshold more precisely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org