You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/02/15 10:20:39 UTC

[GitHub] [druid] clintropolis opened a new pull request, #13809: move numeric null value coercion out of expression processing engine

clintropolis opened a new pull request, #13809:
URL: https://github.com/apache/druid/pull/13809

   ### Description
   This PR removes numeric null value coercion from the expression processing engine in favor of doing it on the consumer side. This appears to have minimal impact, since `isNumericNull` will still return false in default value mode, and numeric operators check this and use the numeric primitive methods of `ExprEval` instead of `value()`. This walks back some of the changes in #13781.
   
   Backstory, it would be nice to always write null values into segments, shifting `druid.generic.useDefaultValueForNull` to a query time only setting on whether or not to use the null value bitmap index when scanning/aggregating columns. However, I have not made this change yet in this PR.
   
   #### Release note
   Null values input to and created by the Druid native expression processing engine no longer coerce `null` to the type appropriate 'default' value if `druid.generic.useDefaultValueForNull=true`. This should not impact existing behavior since this has been shifted onto the consumer and internally operators will still use default values in this mode, but calling this change to attention in case any subtle behavior changes around the handling of `null` values.
   
   <hr>
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119936829


##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -743,25 +749,37 @@ private NumericExprEval(@Nullable Number value)
     @Override
     public final int asInt()
     {
+      if (value == null) {
+        assert NullHandling.replaceWithDefault();

Review Comment:
   no they can be probably removed, i added during testing and for consistency with some of the other `ExprEval`, but i think i will just remove them all instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1121024997


##########
processing/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java:
##########
@@ -778,6 +842,15 @@ public ExprEvalVector<long[]> asEval()
     );
   }
 
+  /**
+   * Creates an {@link ExprVectorProcessor} for the logical 'and' operator, which produces a long typed vector output
+   * with values set by the following rules:
+   *    true/true -> true (1)
+   *    true/null, null/true, null/null -> null
+   *    false/null, null/false -> false (0)

Review Comment:
   Ahhh, I can see the logic there...  it's providing the best answer based on what it can know from the boolean operator.  It would be a lot easier to reason about if it consistently nulled things out, but if this is how SQL is expected to work, so be it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis merged PR #13809:
URL: https://github.com/apache/druid/pull/13809


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119961085


##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -771,7 +789,7 @@ private static class DoubleExprEval extends NumericExprEval
 
     private DoubleExprEval(@Nullable Number value)
     {
-      super(value == null ? NullHandling.defaultDoubleValue() : (Double) value.doubleValue());
+      super(value == null ? null : value.doubleValue());

Review Comment:
   not really, can be simplified



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119934374


##########
processing/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java:
##########
@@ -331,7 +331,9 @@ public ExprEval eval(ObjectBinding bindings)
       return ExprEval.ofLongBoolean(false);
     }
     ExprEval rightVal;
-    if (NullHandling.sqlCompatible() || Types.is(leftVal.type(), ExprType.STRING)) {
+    // null values can (but not always) appear as string typed
+    // so type isn't necessarily string unless value is non-null
+    if (NullHandling.sqlCompatible() || (Types.is(leftVal.type(), ExprType.STRING) && leftVal.value() != null)) {

Review Comment:
   I initially made this change before i added `valueOrDefault`, because in a situation where expressions are evaluated without the type information to tell us that a null is a long (such as ingestion time transforms), an expression like `null && 1`  with the left side is a null value it will be evaluated as a `STRING` typed expression, and follow the string/sql compatible rules, in default value mode meaning `value()` spits out `null`. if the expression was `1 && null` however, it would be evaluated with long rules, and spit out `0` in default value mode.
   
   None of this really matters now though, since callers should be using `valueOrDefault()` after the changes in this patch, instead of `value()`, which produces equivalent results, so I will switch it back and switch the tests around null constants to use that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119965302


##########
processing/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java:
##########
@@ -242,6 +270,12 @@ public void processIndex(Object[] strings, long[] longs, boolean[] outputNulls,
     return (ExprVectorProcessor<T>) processor;
   }
 
+  /**
+   * Creates an {@link ExprVectorProcessor} for the 'isnull' function, that produces a "boolean" (long) typed output
+   * vector with values set to 1 if the input value was null or 0 if it was not null.

Review Comment:
   yes, will clarify javadocs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119968736


##########
processing/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java:
##########
@@ -778,6 +842,15 @@ public ExprEvalVector<long[]> asEval()
     );
   }
 
+  /**
+   * Creates an {@link ExprVectorProcessor} for the logical 'and' operator, which produces a long typed vector output
+   * with values set by the following rules:
+   *    true/true -> true (1)
+   *    true/null, null/true, null/null -> null
+   *    false/null, null/false -> false (0)

Review Comment:
   yeah, SQL compat stuff. `null` is basically a stand-in for "unknown" for logical operations,```true && ???``` could be `true` or `false` depending on the value of `???` so the result is `???`. But ```false && ???``` is definitely false no matter the value of `???`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119974592


##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -771,7 +789,7 @@ private static class DoubleExprEval extends NumericExprEval
 
     private DoubleExprEval(@Nullable Number value)
     {
-      super(value == null ? NullHandling.defaultDoubleValue() : (Double) value.doubleValue());
+      super(value == null ? null : value.doubleValue());

Review Comment:
   ah i was wrong, at least a lot of tests were counting on forced casts, i see lots of 
   ```
   Expected :java.lang.Long<1>
   Actual   :java.lang.Integer<1>
   ```
   after removing it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119974592


##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -771,7 +789,7 @@ private static class DoubleExprEval extends NumericExprEval
 
     private DoubleExprEval(@Nullable Number value)
     {
-      super(value == null ? NullHandling.defaultDoubleValue() : (Double) value.doubleValue());
+      super(value == null ? null : value.doubleValue());

Review Comment:
   ah i was wrong, at least a lot of tests were counting on forced casts, i see lots of 
   ```
   Expected :java.lang.Long<1>
   Actual   :java.lang.Integer<1>
   ```
   after removing it, will leave for now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119647915


##########
processing/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java:
##########
@@ -331,7 +331,9 @@ public ExprEval eval(ObjectBinding bindings)
       return ExprEval.ofLongBoolean(false);
     }
     ExprEval rightVal;
-    if (NullHandling.sqlCompatible() || Types.is(leftVal.type(), ExprType.STRING)) {
+    // null values can (but not always) appear as string typed
+    // so type isn't necessarily string unless value is non-null
+    if (NullHandling.sqlCompatible() || (Types.is(leftVal.type(), ExprType.STRING) && leftVal.value() != null)) {

Review Comment:
   I feel like I'm not understanding this logic, but it seems like previously you were doing null checks if it was a String.  Now,  you will only do null checks if it is a String *and* the left-hand side is not null?  This seems to be trying to coerce for boolean operations, if the left side is numerical but also not null, it seems like we should have semantics that are like `if == 0 then false, else true`.  But, I think that this will just ignore it if it's not String?



##########
processing/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java:
##########
@@ -242,6 +270,12 @@ public void processIndex(Object[] strings, long[] longs, boolean[] outputNulls,
     return (ExprVectorProcessor<T>) processor;
   }
 
+  /**
+   * Creates an {@link ExprVectorProcessor} for the 'isnull' function, that produces a "boolean" (long) typed output
+   * vector with values set to 1 if the input value was null or 0 if it was not null.

Review Comment:
   am I correct in understanding that this creates a `long[]`?



##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -743,25 +749,37 @@ private NumericExprEval(@Nullable Number value)
     @Override
     public final int asInt()
     {
+      if (value == null) {
+        assert NullHandling.replaceWithDefault();

Review Comment:
   Do we really need the assertions?



##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -743,25 +749,37 @@ private NumericExprEval(@Nullable Number value)
     @Override
     public final int asInt()
     {
+      if (value == null) {
+        assert NullHandling.replaceWithDefault();
+        return 0;
+      }
       return value.intValue();
     }
 
     @Override
     public final long asLong()
     {
+      if (value == null) {
+        assert NullHandling.replaceWithDefault();
+        return 0L;
+      }
       return value.longValue();
     }
 
     @Override
     public final double asDouble()
     {
+      if (value == null) {
+        assert NullHandling.replaceWithDefault();
+        return 0.0;
+      }
       return value.doubleValue();
     }
 
     @Override
     public boolean isNumericNull()

Review Comment:
   I know this is a comment on unchanged code, but wondering how you feel about renaming this to just `isNull`?  I've never quite understand why the `Numeric` part was important.



##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -1042,11 +1078,12 @@ public final ExprEval castTo(ExpressionType castTo)
         case STRING:
           return this;
         case ARRAY:
+          final Number number = computeNumber();
           switch (castTo.getElementType().getType()) {
             case DOUBLE:
-              return ExprEval.ofDoubleArray(value == null ? null : new Object[] {computeDouble()});
+              return ExprEval.ofDoubleArray(value == null ? null : new Object[] {number != null ? number.doubleValue() : null});

Review Comment:
   nit: invert the ternary to be `== null`?



##########
processing/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java:
##########
@@ -778,6 +842,15 @@ public ExprEvalVector<long[]> asEval()
     );
   }
 
+  /**
+   * Creates an {@link ExprVectorProcessor} for the logical 'and' operator, which produces a long typed vector output
+   * with values set by the following rules:
+   *    true/true -> true (1)
+   *    true/null, null/true, null/null -> null
+   *    false/null, null/false -> false (0)

Review Comment:
   Why does `null` null out a `true`, but not `null` out a `false`!?!?!?!?!?  Is that some SQL-ism to handling nulls?



##########
processing/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java:
##########
@@ -350,6 +384,12 @@ public ExpressionType getOutputType()
     return (ExprVectorProcessor<T>) processor;
   }
 
+  /**
+   * Creates an {@link ExprVectorProcessor} for the 'isnotnull' function, that produces a "boolean" (long) typed output

Review Comment:
   Same here, a `long[]`?



##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -771,7 +789,7 @@ private static class DoubleExprEval extends NumericExprEval
 
     private DoubleExprEval(@Nullable Number value)
     {
-      super(value == null ? NullHandling.defaultDoubleValue() : (Double) value.doubleValue());
+      super(value == null ? null : value.doubleValue());

Review Comment:
   Does the `.doubleValue()` add anything here?  It looks like it's storing a reference to a Number anyway?  I guess it's a forced cast?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13809: move numeric null value coercion out of expression processing engine

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13809:
URL: https://github.com/apache/druid/pull/13809#discussion_r1119958152


##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -743,25 +749,37 @@ private NumericExprEval(@Nullable Number value)
     @Override
     public final int asInt()
     {
+      if (value == null) {
+        assert NullHandling.replaceWithDefault();
+        return 0;
+      }
       return value.intValue();
     }
 
     @Override
     public final long asLong()
     {
+      if (value == null) {
+        assert NullHandling.replaceWithDefault();
+        return 0L;
+      }
       return value.longValue();
     }
 
     @Override
     public final double asDouble()
     {
+      if (value == null) {
+        assert NullHandling.replaceWithDefault();
+        return 0.0;
+      }
       return value.doubleValue();
     }
 
     @Override
     public boolean isNumericNull()

Review Comment:
   heh, so here it sort of is important for the way things work (right now) with non-vectorized expressions because the contract of the method is sort of like that of the `ColumnValueSelector` `isNull` method - that if it returns false then the `asLong` and such primitive returning methods will return a valid value.
   
   `StringExprEval` for example implements these methods as best effort conversions, so `StringExprEval("100")` returns false for `isNumericNull` and `asLong` returns the value 100L in both default value and sql compat modes. `StringExprEval("x")` returns true for `isNumericNull` if in sql compatible mode, but the `value()` is not in fact `null`, so callers should know that the value of this is `null` in sql compatible mode instead of calling `asLong` to get 0. In default value mode, `isNumericNull` returns false and so no one cares if the string was a real number or not, if it wasn't it was zero.
   
   Things that do not want to use these number methods can just check the result of `value()` directly.
   
   At least for the way things currently work, this is actually probably a more accurate name than `isNull`, and arguably the `ColumnValueSelector` method should probably be renamed to match this one 😛 . The only things that check `isNull` on `ColumnValueSelector` are things that want to use the primitive numeric getters, object selectors often do not implement it since callers are currently expected to call `getObject` and check that it is `null` per the current javadoc contract in `BaseNullableColumnValueSelector`.
   
   I would be in favor of changing this contract into making a correct implementation of `isNull` a requirement for all `ColumnValueSelectors`, but I totally don't want to do that in this PR, and it would require I think splitting up anything that does type coercion which might produce nulls into separate selectors (and expressions to more liberally explicitly cast values to different types instead of use these implicit conversions which some of them have)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org