You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/22 20:43:31 UTC

[GitHub] [incubator-druid] himanshug opened a new issue #8126: [Proposal] BufferAggregator support for growable sketches.

himanshug opened a new issue #8126: [Proposal] BufferAggregator support for growable sketches.
URL: https://github.com/apache/incubator-druid/issues/8126
 
 
   ### Motivation
   
   1. From #6743
   "Theta sketches have a very large max size by default, relative to typical row sizes (about 250KB with "size" set to the default of 16384). The ingestion-time row size estimator (getMaxBytesPerRowForAggregators in OnheapIncrementalIndex) uses this figure to estimate row sizes when theta sketches are used at ingestion time, leading to way more spills than is reasonable. It would be better to use an estimate based more on actual current size. I'm not sure how to get this, though."
   
   2. From #2963
   "In case of complex aggregators like thetaSketch, more than 80% of the time sketches don't grow to full capacity but query processing still reserves the full max size. The idea is to support use of a resizable aggregator by BufferAggregator so that maximum space is not reserved but BufferAggregator should be able to re-allocate the buffer on demand."
   
   3. Some sketches e.g. `doublesSketch` do not have an upper bound on size and can't provide a correct number for ` AggregatorFactory.getMaxIntermediateSize()` . Current workaround, they use, is to use some number and then fall back to on-heap objects if sketch grows bigger than that.
   
   ### Proposed changes
   
   Add following methods to `AggregatorFactory`
   
   ```
     /**
      * Does BufferAggregator support handling of varying ByteBuffer sizes by overriding
      * {@link BufferAggregator#aggregate(ByteBuffer, int, int)}
      * @return
      */
     public boolean isDynamicallyResizable()
     {
       return getMinIntermediateSize() < getMaxIntermediateSize();
     }
     
     /**
      * Start size of ByteBuffer to be used with BufferAggregator.
      * @return
      */
     public int getMinIntermediateSize()
     {
       return getMaxIntermediateSize();
     }
   ```
   
   Add following methods to `BufferAggregator`
   
   ```
     /**
      * Returns false if aggregation requires a bigger buffer than capacity arg or true.
      * Return status must be used exclusively to signal "low memory in buffer" condition and
      * nothing else.
      */
     default boolean aggregate(ByteBuffer buff, int position, int capacity)
     {
       aggregate(buff, position);
       return true;
     }
   
     // Following methods are equivalent of old methods with same name except they provide access to capacity
     // of ByteBuffer which is assumed to be AggregatorFactory.getMaxIntermediateSize() by older methods.
   
     default void init(ByteBuffer buff, int position, int capacity)
     {
       init(buff, position);
     }
   
     default void relocate(ByteBuffer oldBuff, int oldPosition, int oldCapacity, ByteBuffer newwBuff, int newwPosition, int newwCapacity)
     {
       relocate(oldPosition, newwPosition, oldBuff, newwBuff);
     }
   
     default Object get(ByteBuffer buff, int position, int capacity)
     {
       return get(buff, position);
     }
   
     default float getFloat(ByteBuffer buff, int position, int capacity)
     {
       return getFloat(buff, position);
     }
   
     default double getDouble(ByteBuffer buff, int position, int capacity)
     {
       return getDouble(buff, position);
     }
   
     default long getLong(ByteBuffer buff, int position, int capacity)
     {
       return getLong(buff, position);
     }
   
     default boolean isNull(ByteBuffer buff, int position, int capacity)
     {
       return isNull(buff, position);
     }
   ```
   
   Update all of Druid core code to remove usage of `Aggregator` and use `BufferAggregator` in all places using the newly introduced methods. Once that is done, `Aggregator` interface and `AggregatorFactory.factorize(ColumnSelectorFactory)` and `AggregatorFactory.getMaxIntermediateSize()` can be removed.
   
   At least `BufferAggregator` that work on top of sketches of variable sizes should be updated to implement newly introduced methods. In those extensions, `AggregatorFactory.getMinIntermediateSize()` should return a value that can be overridden by user per query/indexing to allow fine tuning.
   
   Following interfaces and classes are introduced to aid usage of new methods in aggregator extension.
   
   ```
   public interface MemoryAllocator
   {
     BufferHolder allocate(int capacity);
     void free(BufferHolder bh);
   }
   
   public interface BufferHolder
   {
     int position();
     int capacity();
     ByteBuffer get();
   }
   
   public class SimpleOnHeapMemoryAllocator implements MemoryAllocator
   {
     @Override
     public BufferHolder allocate(int capacity)
     {
       return new SimplerBufferHolder(ByteBuffer.allocate(capacity));
     }
   
     @Override
     public void free(BufferHolder ignored)
     {
   
     }
     
     // This would be used in OnHeapIncrementalIndex
     private static class SimplerBufferHolder implements BufferHolder
     {
       private final ByteBuffer bb;
   
       public SimplerBufferHolder(ByteBuffer bb)
       {
         this.bb = bb;
       }
   
       @Override
       public int position()
       {
         return 0;
       }
   
       @Override
       public int capacity()
       {
         return bb.capacity();
       }
   
       @Override
       public ByteBuffer get()
       {
         return bb;
       }
     }
   }
   ```
   
   
   ### Rationale
   
   https://github.com/apache/incubator-druid/issues/2963
   
   ### Operational impact
   
   None
   
   ### Test plan (optional)
   
   Unit Tests should cover the changes.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org