You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/08 22:28:29 UTC

[GitHub] [incubator-druid] himanshug commented on issue #6743: IncrementalIndex generally overestimates theta sketch size

himanshug commented on issue #6743: IncrementalIndex generally overestimates theta sketch size
URL: https://github.com/apache/incubator-druid/issues/6743#issuecomment-509415145
 
 
   one , I think more flexible for druid core code, alternative to https://github.com/apache/incubator-druid/issues/2963 is that 
   
   add following method to `AggregatorFactory` 
   ```
     /* Smallest size ByteBuffer to use with BufferAggregator. This can be optionally overridden to support
     incremental size offheap buffers.
      */
     public int getMinimumOffheapSize()
     {
       return getMaxIntermediateSize();
     }
   ```
   
   and  add `throws MemoryExceededException` to `BufferAggregator.aggregate(ByteBuffer buffer,int position)` method with contract that `MemoryExceededException` can only be thrown if `buffer` provided is of a smaller size than `AggregatorFactory.getMaxIntermediateSize()` and underlying sketch does need more space really.
   
   With that druid code using `BufferAggregator` can work as follow...
   
   start with a on/off heap `ByteBuffer` or size `AggregatorFactory.getMinimumOffheapSize()` , during aggregation if `MemoryExceededException` is raised then allocate another on/off heap `ByteBuffer` of double the size (optionally `MemoryExceededException` could have a variable telling what is minimum required increased size that `BufferAggregator.aggregate(..)` sets when throwing ex) and call `relocate(int oldPosition, int newPosition, ByteBuffer oldBuffer, ByteBuffer newBuffer)` and then carry aggregation forward.
   
   now the question of how sophisticated the MemoryAllocator can or can not be is outside of aggregator implementation. I think it will be "enough" to let OS deal with it and use calls to `ByteBuffer.allocate(size)` (in OnHeapIncrementalIndex , use `BufferAggregator` instead of `Aggregator`) and `ByteBuffer.allocateDirect(size)` (in query engines)
   
   We track the amount of memory used for these `dynamically sized aggregators` and have configurations to limit amount of memory usage.
   For now, `dynamically sized aggregators` are the ones for which `AggregatorFactory.getMinimumOffheapSize() < AggregatorFactory.getMaxIntermediateSize()` .
   
   When all druid code is updated to handle `dynamically sized aggregators` , we can actually remove `AggregatorFactory.getMaxIntermediateSize()` method altogether and add a `boolean AggregatorFactory.supportsDynamicallySizedAggregation()` and allow `BufferAggregator.aggregate(..)` to throw `MemoryExceededException` at anytime to support cases like `doublesSketch` where there is no known max size really.
   We can probably also remove `Aggregator` interface at this point as well.
   
   so overall we solve two problems...
   1) not over-allocate/over-estimate memory
   2) support full on off-heap aggregation for sketches where there is no maximum size e.g. `doubleSketch`
   
   @gianm does that make sense,  if yes, I will probably extract this into  a separate proposal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org