You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/11/06 19:02:39 UTC

[GitHub] [incubator-druid] gianm opened a new pull request #8834: Fixes, adjustments to numeric null handling and string first/last aggregators.

gianm opened a new pull request #8834: Fixes, adjustments to numeric null handling and string first/last aggregators.
URL: https://github.com/apache/incubator-druid/pull/8834
 
 
   There is a class of bugs due to the fact that BaseObjectColumnValueSelector
   has both "getObject" and "isNull" methods, but in most selector implementations
   and most call sites, it is clear that the intent of "isNull" is only to apply
   to the primitive getters, not the object getter. This makes sense, because the
   purpose of isNull is to enable detection of nulls in otherwise-primitive columns.
   Imagine a string column with a numeric selector built on top of it. You would
   want it to return isNull = true, so numeric aggregators don't treat it as
   all zeroes.
   
   Sometimes this design leads people to accidentally guard non-primitive get
   methods with "selector.isNull" checks, which is improper.
   
   This patch has three goals:
   
   1) Fix null-handling bugs that already exist in this class.
   2) Make interface and doc changes that reduce the probability of future bugs.
   3) Fix other, unrelated bugs I noticed in the stringFirst and stringLast
      aggregators while fixing null-handling bugs. I thought about splitting this
      into its own patch, but it ended up being tough to split from the
      null-handling fixes.
   
   For (1) the fixes are,
   
   - Fix StringFirst and StringLastAggregatorFactory to stop guarding getObject
     calls on isNull, by no longer extending NullableAggregatorFactory. Now uses
     -1 as a sigil value for null, to differentiate nulls and empty strings.
   - Fix ExpressionFilter to stop guarding getObject calls on isNull. Also, use
     eval.asBoolean() to avoid calling getLong on the selector after already
     calling getObject.
   - Fix ObjectBloomFilterAggregator to stop guarding DimensionSelector calls
     on isNull. Also, refactored slightly to avoid the overhead of calling
     getObject followed by another getter (see BloomFilterAggregatorFactory for
     part of this).
   
   For (2) the main changes are,
   
   - Remove the "isNull" method from BaseObjectColumnValueSelector.
   - Clarify "isNull" doc on BaseNullableColumnValueSelector.
   - Rename NullableAggregatorFactory -> NullbleNumericAggregatorFactory to emphasize
     that it only works on aggregators that take numbers as input.
   - Similar naming changes to the Aggregator, BufferAggregator, and AggregateCombiner.
   - Similar naming changes to helper methods for groupBy, ValueMatchers, etc.
   
   For (3) the other fixes for StringFirst and StringLastAggregatorFactory are,
   
   - Fixed buffer overrun in the buffer aggregators when some characters in the string
     code into more than one byte (the old code used "substring" to apply a byte limit,
     which is bad). I did this by introducing a new StringUtils.toUtf8WithLimit method.
   - Adjusted logic that improperly read from time selectors when folding.
   - Adjusted logic to avoid inefficiently read from value selectors before knowing if
     the time selector had an earlier or later value.
   - Removed stringFirstFold, stringLastFold (undocumented and seemingly unnecessary);
     replace them with a "folding" parameter to the aggregators. I think it makes the
     code easier to follow.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org