You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/12/04 07:39:52 UTC

[GitHub] [druid] cheddar edited a comment on issue #12022: [Proposal] Improve memory estimations in `OnHeapIncrementalIndex`

cheddar edited a comment on issue #12022:
URL: https://github.com/apache/druid/issues/12022#issuecomment-985985486


   > > Workaround (Rejected):
   > > To retain compatibilty, we could add a new default long aggregateAndEstimateMemory() method and leave the existing aggregate() method as is. The default implementation would return a negative value (say -1) in which case the OnHeapIncrementalIndex or any other estimator would use the max estimated size for that invocation.
   > 
   > This approach actually seems like a good idea to me for a few reasons:
   > 
   >     * Aggregation is hot code, and there will be some overhead to size estimation, and query-time callers don't necessarily need it. So having two methods allows the caller to decide if it actually needs size estimates or not.
   > 
   >     * We want to avoid incompatible changes to aggregation interfaces whenever possible.
   
   While this proposal does introduce a pretty big backwards incompatibility and I hate introducing such compatibilities, doing this in the proposed backwards compatible way adds a branch after every single call to the aggregator which, given that this is a hot method, could have significant inadvertent impacts to query performance.  Having the method be guaranteed to return a value that is always added minimizes that impact.
   
   That said, with just the proposed interface, the conditional still moves into the implementation of the Aggregator itself for simple aggregators like a sum aggregator where it's a fixed memory size from initialization as the first call to `aggregate()` doubles as the "initialization" call.
   
   Thinking about this gave me another idea for how to minimize the incompatibility:
   
   We could add a method `getInitializedMemorySize()` which returns how much memory is consumed by just being initialized.  For static-sized aggregators, this would essentially equate to returning the current size estimation from this one method and then always returning 0 from the `aggregate()` call.  
   
   Alternatively, instead of adding a method to the Aggregator interface, we could give `AggregatorFactory` an `AggregatorFactory.factorizeWithSize()` which returns a "holder" object that has both a reference to the `Aggregator` and its size.  Given that `AggregatorFactory` is the thing that knows the intermediate size right now, this approach would actually allow for a default implementation that pushes the current size into the return object.  If we then create a `Aggregator.aggregateWithSize` method, it could default to an implementation that calls `aggregate` and returns 0.  Which now actually does make this a completely backwards compatible change.  My only worry with that is that the most common implementation of `Aggregator.aggregate` would likely be via that default implementation and I'm not sure if this would be adding yet another virtual function call to the hot path.  
   
   The least-risk of impact approach is probably to add the `factorizeWithSize()` and still give `aggregate` a return type.  This is backwards incompatible, but the vast majority of implementations just need to add a `return 0;` so, at least the change required is minimized.
   
   Perhaps we try the fully backwards compatible approach, benchmark it.  Adjust to the incompatible, non-default method, benchmark that and see if there's a marked difference between them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org