You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/12 21:44:13 UTC

[GitHub] [druid] clintropolis commented on issue #10644: Vector Filtered Aggregates return the default buffer init values when no rows match the filter predicate

clintropolis commented on issue #10644:
URL: https://github.com/apache/druid/issues/10644#issuecomment-840116328


   Hi @damnMeddlingKid , @ericxiao251, (and sorry I missed the mailing list thread)
   
   I did document this behavior recently in #11188, which includes a new column that lists out all of the initial aggregator values in both modes, https://github.com/apache/druid/blob/master/docs/querying/sql.md#aggregation-functions. 
   
   Whether or not this is the most correct thing for min/max to be doing I would consider this fair game to be up for debate. As mentioned in this thread already, in SQL compatible null handling mode these aggregators are initialized to the null value, and so not aggregating any rows will produce the expected `null` result, which in my personal view seems like the only correct thing to do in the case where filters do not match, but that doesn't work for default mode of course.
   
   As alluded to by @abhishekagarwal87, I don't think we could make the 'default' mode min/max aggregators return `0` either without storing some additional information to distinguish not aggregating values (so probably no longer using the primitive numeric base aggregators, or perhaps always using the nullable version and just coercing to 0 later?), or by dropping the very end values and assuming Long.MIN_VALUE and Long.MAX_VALUE don't legitimately exist and translating them to 0 in a finalizer, which would allow them to keep using the numeric primitive aggregator base classes.
   
   "default" null handling mode is often unintuitive I think, especially in SQL queries, and for the most part SQL compatible null handling mode should have behavior that is more consistent with other databases and with what you would expect in SQL. I would much prefer long term to deprecate and eventually remove default value mode, so that this case of no matches always returns null, but it seems reasonable to discuss adjusting the default values for default value mode until then.
   
   I don't have a strong opinion on the default value for min/max... what is the motivation to have 0 instead of an unlikely value returned when nothing matches? Or is this just confusion since none of this was previously explicitly documented so the only reference was the documentation on the two null handling modes?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org