You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/11 21:12:48 UTC

[GitHub] [druid] clintropolis opened a new pull request #11184: vectorize logical operators and boolean functions

clintropolis opened a new pull request #11184:
URL: https://github.com/apache/druid/pull/11184


   ### Description
   This PR adds vectorization support for the Druid native expression logical operators, `!`, `&&`, `||`, as well as boolean functions `isnull`, `notnull`, and `nvl`.
   
   The `&&`, `||`, and `nvl` implementations are not quite as optimal as they could be, since they will evaluate all arguments up front, but I will fix this up in another branch I have as a follow-up (split from this one because it was starting to get big). The follow-up will add vectorized conditional expressions, and introduces a filtered vector binding that uses vector matches to allow processing only a subset of input rows. This filtered binding will allow slightly modifying these implementations so that `||` only needs to evaluate the rhs rows for which the lhs is false or null, `&&` where lhs is true or null, and `nvl` only when lhs is null.
   
   In the process of doing this PR and writing tests, I came to the conclusion that our logical operators behave very strangely all the time, and I think quite wrong in SQL compatible null handling mode since null was always treated as "false" instead of "unknown", so I've proposed changing the behavior in this PR.
   
   The first change is around output. Previously the logical operators would pass values through. For Druid numeric values, any value greater than 0 is `true`, so passing through values would result in some strange but technically correct outputs. The new behavior homogenizes output to always be `LONG` boolean values, e.g. `1` or `0` for boolean operations involving any types. 
   
   Previous behavior:
   * `100 && 11` -> `11` 
   * `0.7 || 0.3` -> `0.7`
   * `100 && 0` -> `0`
   * `'troo' && 'true'` -> `'troo'`
   * `'troo' || 'true'` -> `'true'`
   
   New behavior:
   * `100 && 11` -> `1`
   * `0.7 || 0.3` -> `1`
   * `100 && 0` -> `0`
   * `'troo' && 'true'` -> `0`
   * `'troo' || 'true'` -> `1`
   
   etc.
   
   The implicit conversion of `STRING`, `DOUBLE`, and `LONG` values to booleans remains in effect:
   * `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false`
   * `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`. 
   
   and has been documented.
   
   The second change is that now the logical operators in SQL compatible mode will treat `null` as "unknown" when `druid.generic.useDefaultValueForNull` is set as `false` (SQL compatible null handling mode).
   
   For the "or" operator:
   * `true || null`, `null || true` -> `1`
   * `false || null`, `null || false`, `null || null`-> `null`
   
   For the "and" operator:
   * `true && null`, `null && true`, `null && null` -> `null`
   * `false && null`, `null && false` -> `0`
   
   
   Since this new behavior changes query results, subtly in the case of default mode since the results will be different (but equivalent in terms of true or false result), and fairly significantly in SQL compatible mode since it will now respect `null` values and treat them differently than always false, this PR also adds a new configuration, `druid.expressions.useLegacyLogicalOperators`, which defaults to false to use the new behavior in this PR, but allows reverting to the existing behavior if desired. I don't really love this flag existing, but don't see a way around it unless we are ok with changing query results to use the new behavior only. `&&` and `||` will only be vectorized when `druid.expressions.useLegacyLogicalOperators=false`, but if there is demand for it, implementations could probably be added to provide the legacy behavior as well.
   
   benchmarks:
   ```
         // 30: logical and operator
         "SELECT CAST(long1 as BOOLEAN) AND CAST (long2 as BOOLEAN), COUNT(*) FROM foo GROUP BY 1 ORDER BY 2",
         // 31: isnull, notnull
         "SELECT long5 IS NULL, long3 IS NOT NULL, count(*) FROM foo GROUP BY 1,2 ORDER BY 3"
   ```
   ```
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  Mode  Cnt    Score    Error  Units
   SqlExpressionBenchmark.querySql       30           5000000        false  avgt    5  761.667 ± 35.745  ms/op
   SqlExpressionBenchmark.querySql       30           5000000        force  avgt    5  152.102 ±  9.390  ms/op
   SqlExpressionBenchmark.querySql       31           5000000        false  avgt    5  431.104 ± 32.951  ms/op
   SqlExpressionBenchmark.querySql       31           5000000        force  avgt    5  100.884 ±  8.919  ms/op
   ```
   <hr>
   
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #11184:
URL: https://github.com/apache/druid/pull/11184#issuecomment-983224594


   >Does the change also cover the negation operator? NOT a where a is NULL results in NULL. (Note that a IS NOT NULL is entirely different!)
   
   This was already behaving correctly in SQL compatible null handling mode, so no changes were needed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11184:
URL: https://github.com/apache/druid/pull/11184#discussion_r759805910



##########
File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##########
@@ -385,7 +322,59 @@ protected BinaryOpExprBase copy(Expr left, Expr right)
   public ExprEval eval(ObjectBinding bindings)
   {
     ExprEval leftVal = left.eval(bindings);
-    return leftVal.asBoolean() ? right.eval(bindings) : leftVal;
+    if (!ExpressionProcessing.useStrictBooleans()) {
+      return leftVal.asBoolean() ? right.eval(bindings) : leftVal;
+    }
+
+    // if left is false, always false
+    if (leftVal.value() != null && !leftVal.asBoolean()) {
+      return ExprEval.ofLongBoolean(false);

Review comment:
       yeah, it seems reasonable to make true and false constants, I may make this change in this PR or in a follow-up later if nothing else comes up that needs immediate change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s closed pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
suneet-s closed pull request #11184:
URL: https://github.com/apache/druid/pull/11184


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11184:
URL: https://github.com/apache/druid/pull/11184#discussion_r759804640



##########
File path: core/src/main/java/org/apache/druid/math/expr/BinaryOperatorExpr.java
##########
@@ -157,3 +158,67 @@ protected ExprEval evalString(@Nullable String left, @Nullable String right)
 
   protected abstract double evalDouble(double left, double right);
 }
+
+@SuppressWarnings("ClassName")
+abstract class BinaryBooleanOpExprBase extends BinaryOpExprBase
+{
+  BinaryBooleanOpExprBase(String op, Expr left, Expr right)
+  {
+    super(op, left, right);
+  }
+
+  @Override
+  public ExprEval eval(ObjectBinding bindings)
+  {
+    ExprEval leftVal = left.eval(bindings);
+    ExprEval rightVal = right.eval(bindings);
+
+    // Result of any Binary expressions is null if any of the argument is null.
+    // e.g "select null * 2 as c;" or "select null + 1 as c;" will return null as per Standard SQL spec.
+    if (NullHandling.sqlCompatible() && (leftVal.value() == null || rightVal.value() == null)) {
+      return ExprEval.of(null);
+    }
+
+    ExpressionType type = ExpressionTypeConversion.autoDetect(leftVal, rightVal);
+    boolean result;
+    switch (type.getType()) {

Review comment:
       What you describe is the way vector expression processing works, but the non-vectorized implementations of eval are filled with all sorts of branches (in part due to not always being able to know the input types at setup time), so this implementation is just being consistent with other non-vectorized eval implementations. Using vector processors with a vector size of 1 for the non-vectorized engine does seem to offer a light performance increase, since the majority of branches can be eliminated at setup time as well as being stronger typed so numeric primitives can avoid boxing/unboxing, (but there is still a lot of overhead wrapper objects that show their weight when there is one per row instead of one per batch).
   
   This whole area is ripe for code generation of some sort, i've been trying to get the base vectorized implementation in place so we have a baseline to compare different strategies against, so maybe in that world we can have better implementations for non-vectorized expression processing as well, but we're not quite there yet I think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis edited a comment on pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
clintropolis edited a comment on pull request #11184:
URL: https://github.com/apache/druid/pull/11184#issuecomment-983224594


   >Does the change also cover the negation operator? NOT a where a is NULL results in NULL. (Note that a IS NOT NULL is entirely different!)
   
   This was already behaving correctly in SQL compatible null handling mode, so no changes were needed
   
   > Did a prior (or will a future) change handle nulls in math operators? 10 + NULL -> NULL?
   
   As far as I know, these should also already behave correctly prior to this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #11184:
URL: https://github.com/apache/druid/pull/11184#issuecomment-978188290


   I like the new "strict booleans" property name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #11184:
URL: https://github.com/apache/druid/pull/11184


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #11184:
URL: https://github.com/apache/druid/pull/11184#issuecomment-880327567


   I tested what calcite does if you just issue a select statement in the web-console, and it's behavior is similar to what is spelt out in the PR description. I'm in agreement with the behavior change in the description. The one thing I did notice is that calcite treats `'true'` and `true` differently. NOTE to self: check for this subtle difference 
   `SELECT 'true' || null` returns `'true'`
   `SELECT true || null` returns `1`
   
   The feature flag does seem like the only option since this change does affect query results. I am reading through the code now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11184:
URL: https://github.com/apache/druid/pull/11184#discussion_r754029048



##########
File path: core/src/main/java/org/apache/druid/math/expr/ExpressionProcessing.java
##########
@@ -48,11 +48,17 @@
   @VisibleForTesting
   public static void initializeForTests(@Nullable Boolean allowNestedArrays)
   {
-    INSTANCE = new ExpressionProcessingConfig(allowNestedArrays);
+    INSTANCE = new ExpressionProcessingConfig(allowNestedArrays, null);
+  }
+
+  @VisibleForTesting
+  public static void initializeForLegacyLogicalOperationsTests(boolean useLegacy)
+  {
+    INSTANCE = new ExpressionProcessingConfig(null, useLegacy);
   }
 
   /**
-   * whether nulls should be replaced with default value.
+   * [['is expression support for'],['nested arrays'],['enabled?']]

Review comment:
       🥸

##########
File path: core/src/main/java/org/apache/druid/math/expr/vector/BivariateFunctionVectorProcessor.java
##########
@@ -20,68 +20,63 @@
 package org.apache.druid.math.expr.vector;
 
 import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExpressionType;
+
+import javax.annotation.Nullable;
 
 /**
- * common machinery for processing two input operators and functions, which should always treat null inputs as null
- * output, and are backed by a primitive values instead of an object values (and need to use the null vectors instead of
- * checking the vector themselves for nulls)
+ * Basic vector processor that processes 2 inputs and works for both primtive value vectors and object vectors.

Review comment:
       primitive (spelling)

##########
File path: docs/misc/math-expr.md
##########
@@ -256,16 +266,42 @@ supported features:
 * constants and identifiers are supported for any column type
 * `cast` is supported for numeric and string types
 * math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types
-* comparison operators: `=`, `!=`, `>`, `>=`, `<`, `<=` are supported for numeric types
+* logical operators: `!`, `&&`, `||`, are supported for string and numeric types
+* comparison operators: `=`, `!=`, `>`, `>=`, `<`, `<=` are supported for string and numeric types
 * math functions: `abs`, `acos`, `asin`, `atan`, `cbrt`, `ceil`, `cos`, `cosh`, `cot`, `exp`, `expm1`, `floor`, `getExponent`, `log`, `log10`, `log1p`, `nextUp`, `rint`, `signum`, `sin`, `sinh`, `sqrt`, `tan`, `tanh`, `toDegrees`, `toRadians`, `ulp`, `atan2`, `copySign`, `div`, `hypot`, `max`, `min`, `nextAfter`,  `pow`, `remainder`, `scalb` are supported for numeric types
 * time functions: `timestamp_floor` (with constant granularity argument) is supported for numeric types
+* boolean functions: `isnull`, `notnull` are supported for string and numeric types
+* conditional functions: `nvl` is supported for string and numeric types
+* string functions: the concatenation operator (`+`) and `concat` function are supported for string and numeric types
 * other: `parse_long` is supported for numeric and string types
 
+## Legacy logical operator mode
+Prior to the 0.23 release of Apache Druid, the logical 'and' and 'or' operators behaved in a non-standard manner, but this behavior has been changed so that these operations output 'homogeneous' boolean values - `LONG` values of `0` for `true` and `1` for `false`. Druid currently still retains implicit conversion of `LONG`, `DOUBLE`, and `STRING` types into boolean values: 
+* `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false`
+* `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`. 
 
-## Other functions
+Old behavior:
+* `100 && 11` -> `11`
+* `0.7 || 0.3` -> `0.3`
+* `100 && 0` -> `0`
+* `'troo' && 'true'` -> `'troo'`
+* `'troo' || 'true'` -> `'true'`
 
-| function | description |
-| --- | --- |
-| human_readable_binary_byte_format(value[, precision]) | Format a number in human-readable [IEC](https://en.wikipedia.org/wiki/Binary_prefix) format. `precision` must be in the range of [0,3] (default: 2). For example:<li> human_readable_binary_byte_format(1048576) returns `1.00 MiB`</li><li>human_readable_binary_byte_format(1048576, 3) returns `1.000 MiB`</li> |
-| human_readable_decimal_byte_format(value[, precision]) | Format a number in human-readable [SI](https://en.wikipedia.org/wiki/Binary_prefix) format. `precision` must be in the range of [0,3] (default: 2). For example:<li> human_readable_decimal_byte_format(1000000) returns `1.00 MB`</li><li>human_readable_decimal_byte_format(1000000, 3) returns `1.000 MB`</li> |
-| human_readable_decimal_format(value[, precision]) | Format a number in human-readable SI format. `precision` must be in the range of [0,3] (default: 2). For example:<li>human_readable_decimal_format(1000000) returns `1.00 M`</li><li>human_readable_decimal_format(1000000, 3) returns `1.000 M`</li>  |
+Current behavior:
+* `100 && 11` -> `1`
+* `0.7 || 0.3` -> `1`
+* `100 && 0` -> `0`
+* `'troo' && 'true'` -> `0`
+* `'troo' || 'true'` -> `1`
+
+Additionally, the logical operators in these older versions did not honor SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`). The current version treats `null` values as "unknown".
+
+For the "or" operator:
+* `true || null`, `null || true`, -> `1`
+* `false || null`, `null || false`, `null || null`-> `null`
+
+For the "and" operator:
+* `true && null`, `null && true`, `null && null` -> `null`
+* `false && null`, `null && false` -> `0`
+
+To revert to the behavior of previous Druid versions, `druid.expressions.useLegacyLogicalOperators` can be set to `true` in your Druid configuration.

Review comment:
       This should also go in `configuration/index.md`, so all the configuration things are in one place.

##########
File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
##########
@@ -222,6 +222,25 @@ default boolean areNumeric(Expr... args)
       return areNumeric(Arrays.asList(args));
     }
 
+    default boolean areSameTypes(List<Expr> args)
+    {
+      ExprType currentType = null;
+      boolean allSame = true;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (currentType == null) {
+          currentType = argType;
+        }
+        allSame &= argType == currentType;
+      }
+      return allSame;

Review comment:
       As I read it, the intent is more like `areDefinitelyTheSameType` (i.e. it should return false if it cannot prove that the exprs are all the same type).
   
   @clintropolis if that's correct, clarifying javadoc would be useful.

##########
File path: core/src/main/java/org/apache/druid/math/expr/ExpressionProcessingConfig.java
##########
@@ -27,20 +27,39 @@
 public class ExpressionProcessingConfig
 {
   public static final String NESTED_ARRAYS_CONFIG_STRING = "druid.expressions.allowNestedArrays";
+  public static final String NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING = "druid.expressions.useLegacyLogicalOperators";
 
   @JsonProperty("allowNestedArrays")
   private final boolean allowNestedArrays;
 
+  @JsonProperty("useLegacyLogicalOperators")
+  private final boolean useLegacyLogicalOperators;
+
   @JsonCreator
-  public ExpressionProcessingConfig(@JsonProperty("allowNestedArrays") @Nullable Boolean allowNestedArrays)
+  public ExpressionProcessingConfig(
+      @JsonProperty("allowNestedArrays") @Nullable Boolean allowNestedArrays,
+      @JsonProperty("useLegacyLogicalOperators") @Nullable Boolean useLegacyLogicalOperators
+  )
   {
     this.allowNestedArrays = allowNestedArrays == null
                              ? Boolean.valueOf(System.getProperty(NESTED_ARRAYS_CONFIG_STRING, "false"))
                              : allowNestedArrays;
+    if (useLegacyLogicalOperators == null) {
+      this.useLegacyLogicalOperators = Boolean.parseBoolean(
+          System.getProperty(NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING, "false")

Review comment:
       Can we set the default to "true" for some time? (to minimize sudden disruption)

##########
File path: core/src/main/java/org/apache/druid/math/expr/ExpressionProcessing.java
##########
@@ -65,4 +71,14 @@ public static boolean allowNestedArrays()
     }
     return INSTANCE.allowNestedArrays();
   }
+
+
+  public static boolean useLegacyLogicalOperators()
+  {
+    // this should only be null in a unit test context, in production this will be injected by the null handling module
+    if (INSTANCE == null) {
+      throw new IllegalStateException("NullHandling module not initialized, call NullHandling.initializeForTests()");

Review comment:
       Module name should be ExpressionProcessing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #11184:
URL: https://github.com/apache/druid/pull/11184#issuecomment-976149393


   > The existing behavior, while not very SQLy and something that I agree we should move away from, may have people that depend on it.
   
   heh, the current behavior reminds me a lot of https://www.destroyallsoftware.com/talks/wat 😅 
   
   > How do you feel about defaulting to legacy behavior, but updating the bundled common.runtime.properties files to set legacy = false? That way, most new users would get the new behavior, but people upgrading will retain existing behavior. In a future release, we could then change the default to legacy = false. Maybe at the same time as we swap the null handling default?
   
   I don't love it, but I guess it would be ok to swap the default whenever we swap to SQL compatible null handling (which I also hope isn't so far from now). Vectorization for virtual columns is also not currently on by default, so unless that is also explicitly set people wouldn't get the benefit from the new behavior... other than saner results. The performance increase for these expressions being vectorized would maybe change my stance to be a bit more in favor of turning it on by default, and I do think the current behavior is ... not good for SQL, but I guess not having disruptions of running clusters in an upgrade is nice.
   
   I guess I should write some more docs to try to encourage people to enable this new mode and we should shout it out in the release notes so that operators who do want SQL compatible behavior know to turn on this setting, and the vectorization is a bit of a motivator to make the switch (I don't think the current behavior should be vectorized or maybe even could be vectorized because the output type is potentially varying row to row depending on the truthy/falsy values of inputs)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis edited a comment on pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
clintropolis edited a comment on pull request #11184:
URL: https://github.com/apache/druid/pull/11184#issuecomment-983224594


   >Does the change also cover the negation operator? NOT a where a is NULL results in NULL. (Note that a IS NOT NULL is entirely different!)
   
   This was already behaving correctly in SQL compatible null handling mode, so no changes were needed
   
   > Did a prior (or will a future) change handle nulls in math operators? 10 + NULL -> NULL?
   
   As far as I know, these should also already behave correctly prior to this PR
   
   > SQL also applies NULL handling to all functions and operators: in general, f(NULL) -> NULL for every function f, unless special cased, as in the logical operators.
   
   I didn't do a complete survey, but i believe many functions are doing the correct thing with regards to null handling, (but it would probably be worth validating and correcting any behavior that isn't consistent)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11184:
URL: https://github.com/apache/druid/pull/11184#discussion_r759805795



##########
File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##########
@@ -385,7 +322,59 @@ protected BinaryOpExprBase copy(Expr left, Expr right)
   public ExprEval eval(ObjectBinding bindings)
   {
     ExprEval leftVal = left.eval(bindings);
-    return leftVal.asBoolean() ? right.eval(bindings) : leftVal;
+    if (!ExpressionProcessing.useStrictBooleans()) {

Review comment:
       yeah, this is how most of the non-vectorized processing is currently, it would be possible with some refactoring I think to make specialized implementations for various cases, but we've been mainly focusing on doing that where possible in the vectorized expression processing, since operating on batches is significantly faster and has less overhead




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #11184:
URL: https://github.com/apache/druid/pull/11184#discussion_r670128506



##########
File path: core/src/test/java/org/apache/druid/math/expr/EvalTest.java
##########
@@ -372,4 +372,111 @@ public void testBooleanReturn()
     Assert.assertFalse(eval.asBoolean());
     Assert.assertEquals(ExprType.DOUBLE, eval.type());
   }
+
+  @Test
+  public void testLogicalOperators()
+  {
+    Expr.ObjectBinding bindings = InputBindings.withMap(
+        ImmutableMap.of()
+    );
+
+    Assert.assertEquals("true", eval("'true' && 'true'", bindings).value());
+    Assert.assertEquals("false", eval("'true' && 'false'", bindings).value());
+    Assert.assertEquals("false", eval("'false' && 'true'", bindings).value());
+    Assert.assertEquals("false", eval("'troo' && 'true'", bindings).value());
+    Assert.assertEquals("false", eval("'false' && 'false'", bindings).value());
+
+    Assert.assertEquals("true", eval("'true' || 'true'", bindings).value());
+    Assert.assertEquals("true", eval("'true' || 'false'", bindings).value());
+    Assert.assertEquals("true", eval("'false' || 'true'", bindings).value());
+    Assert.assertEquals("true", eval("'troo' || 'true'", bindings).value());
+    Assert.assertEquals("false", eval("'false' || 'false'", bindings).value());
+
+    Assert.assertEquals(1L, eval("1 && 1", bindings).value());
+    Assert.assertEquals(1L, eval("100 && 11", bindings).value());
+    Assert.assertEquals(0L, eval("1 && 0", bindings).value());
+    Assert.assertEquals(0L, eval("0 && 1", bindings).value());
+    Assert.assertEquals(0L, eval("0 && 0", bindings).value());
+
+    Assert.assertEquals(1L, eval("1 || 1", bindings).value());
+    Assert.assertEquals(1L, eval("100 || 11", bindings).value());
+    Assert.assertEquals(1L, eval("1 || 0", bindings).value());
+    Assert.assertEquals(1L, eval("0 || 1", bindings).value());
+    Assert.assertEquals(1L, eval("111 || 0", bindings).value());
+    Assert.assertEquals(1L, eval("0 || 111", bindings).value());
+    Assert.assertEquals(0L, eval("0 || 0", bindings).value());
+
+    Assert.assertEquals(1.0, eval("1.0 && 1.0", bindings).value());
+    Assert.assertEquals(1.0, eval("0.100 && 1.1", bindings).value());
+    Assert.assertEquals(0.0, eval("1.0 && 0.0", bindings).value());
+    Assert.assertEquals(0.0, eval("0.0 && 1.0", bindings).value());
+    Assert.assertEquals(0.0, eval("0.0 && 0.0", bindings).value());
+
+    Assert.assertEquals(1.0, eval("1.0 || 1.0", bindings).value());
+    Assert.assertEquals(1.0, eval("0.2 || 0.3", bindings).value());
+    Assert.assertEquals(1.0, eval("1.0 || 0.0", bindings).value());
+    Assert.assertEquals(1.0, eval("0.0 || 1.0", bindings).value());
+    Assert.assertEquals(1.0, eval("1.11 || 0.0", bindings).value());
+    Assert.assertEquals(1.0, eval("0.0 || 0.111", bindings).value());
+    Assert.assertEquals(0.0, eval("0.0 || 0.0", bindings).value());
+
+    Assert.assertEquals(1L, eval("null || 1", bindings).value());
+    Assert.assertEquals(1L, eval("1 || null", bindings).value());
+    Assert.assertEquals(NullHandling.defaultLongValue(), eval("null || 0", bindings).value());
+    Assert.assertEquals(NullHandling.defaultLongValue(), eval("0 || null", bindings).value());
+    // null/null is evaluated as string typed
+    Assert.assertEquals(null, eval("null || null", bindings).value());
+
+    Assert.assertEquals(NullHandling.defaultLongValue(), eval("null && 1", bindings).value());
+    Assert.assertEquals(NullHandling.defaultLongValue(), eval("1 && null", bindings).value());
+    Assert.assertEquals(0L, eval("null && 0", bindings).value());
+    Assert.assertEquals(0L, eval("0 && null", bindings).value());
+    // null/null is evaluated as string typed
+    Assert.assertEquals(null, eval("null && null", bindings).value());
+
+
+    // turn on legacy insanity mode
+    NullHandling.initializeForLegacyLogicalOperationsTests(true);
+    Assert.assertEquals(1L, eval("1 && 1", bindings).value());
+    Assert.assertEquals(11L, eval("100 && 11", bindings).value());
+    Assert.assertEquals(0L, eval("1 && 0", bindings).value());
+    Assert.assertEquals(0L, eval("0 && 1", bindings).value());
+    Assert.assertEquals(0L, eval("0 && 0", bindings).value());
+
+    Assert.assertEquals(1L, eval("1 || 1", bindings).value());
+    Assert.assertEquals(100L, eval("100 || 11", bindings).value());
+    Assert.assertEquals(1L, eval("1 || 0", bindings).value());
+    Assert.assertEquals(1L, eval("0 || 1", bindings).value());
+    Assert.assertEquals(111L, eval("111 || 0", bindings).value());
+    Assert.assertEquals(111L, eval("0 || 111", bindings).value());
+    Assert.assertEquals(0L, eval("0 || 0", bindings).value());
+
+    Assert.assertEquals(1.0, eval("1.0 && 1.0", bindings).value());
+    Assert.assertEquals(1.1, eval("0.100 && 1.1", bindings).value());
+    Assert.assertEquals(0.0, eval("1.0 && 0.0", bindings).value());
+    Assert.assertEquals(0.0, eval("0.0 && 1.0", bindings).value());
+    Assert.assertEquals(0.0, eval("0.0 && 0.0", bindings).value());
+
+    Assert.assertEquals(1.0, eval("1.0 || 1.0", bindings).value());
+    Assert.assertEquals(0.2, eval("0.2 || 0.3", bindings).value());
+    Assert.assertEquals(1.0, eval("1.0 || 0.0", bindings).value());
+    Assert.assertEquals(1.0, eval("0.0 || 1.0", bindings).value());
+    Assert.assertEquals(1.11, eval("1.11 || 0.0", bindings).value());
+    Assert.assertEquals(0.111, eval("0.0 || 0.111", bindings).value());
+    Assert.assertEquals(0.0, eval("0.0 || 0.0", bindings).value());
+
+    Assert.assertEquals(1L, eval("null || 1", bindings).value());
+    Assert.assertEquals(1L, eval("1 || null", bindings).value());
+    Assert.assertEquals(0L, eval("null || 0", bindings).value());
+    Assert.assertEquals(null, eval("0 || null", bindings).value());
+    Assert.assertEquals(null, eval("null || null", bindings).value());
+
+    Assert.assertEquals(null, eval("null && 1", bindings).value());
+    Assert.assertEquals(null, eval("1 && null", bindings).value());
+    Assert.assertEquals(null, eval("null && 0", bindings).value());
+    Assert.assertEquals(0L, eval("0 && null", bindings).value());
+    Assert.assertEquals(null, eval("null && null", bindings).value());
+    // reset
+    NullHandling.initializeForTests();

Review comment:
       I think you will want this in a try - finally block so that if 1 test fails here, it doesn't throw off any other tests that run in the JVM

##########
File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
##########
@@ -222,6 +222,25 @@ default boolean areNumeric(Expr... args)
       return areNumeric(Arrays.asList(args));
     }
 
+    default boolean areSameTypes(List<Expr> args)
+    {
+      ExprType currentType = null;
+      boolean allSame = true;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (currentType == null) {
+          currentType = argType;
+        }
+        allSame &= argType == currentType;
+      }
+      return allSame;

Review comment:
       if I'm understanding this correctly - a constant expression of a string and a constant expression of null will not be considered the same type. Is this the behavior we want?
   
   I was thinking that a null could in theory be any type so `null && String` coulbe the same type

##########
File path: core/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java
##########
@@ -135,6 +178,660 @@ public void processIndex(String[] strings, long[] longs, boolean[] outputNulls,
     return (ExprVectorProcessor<T>) processor;
   }
 
+  public static <T> ExprVectorProcessor<T> isNull(Expr.VectorInputBindingInspector inspector, Expr expr)
+  {
+
+    final ExprType type = expr.getOutputType(inspector);
+
+    if (type == null) {
+      return constant(1L, inspector.getMaxVectorSize());
+    }
+    final long[] outputValues = new long[inspector.getMaxVectorSize()];
+
+    ExprVectorProcessor<?> processor = null;
+    if (ExprType.STRING == type) {
+      final ExprVectorProcessor<String[]> input = expr.buildVectorized(inspector);
+      processor = new ExprVectorProcessor<long[]>()
+      {
+        @Override
+        public ExprEvalVector<long[]> evalVector(Expr.VectorInputBinding bindings)
+        {
+          final ExprEvalVector<String[]> inputEval = input.evalVector(bindings);
+
+          final int currentSize = bindings.getCurrentVectorSize();
+          final String[] values = inputEval.values();
+          for (int i = 0; i < currentSize; i++) {
+            if (values[i] == null) {
+              outputValues[i] = 1L;
+            } else {
+              outputValues[i] = 0L;
+            }
+          }
+          return new ExprEvalLongVector(outputValues, null);
+        }
+
+        @Override
+        public ExprType getOutputType()
+        {
+          return ExprType.LONG;
+        }
+      };
+    } else if (ExprType.LONG == type) {
+      final ExprVectorProcessor<long[]> input = expr.buildVectorized(inspector);
+      processor = new ExprVectorProcessor<long[]>()
+      {
+        @Override
+        public ExprEvalVector<long[]> evalVector(Expr.VectorInputBinding bindings)
+        {
+          final ExprEvalVector<long[]> inputEval = input.evalVector(bindings);
+
+          final int currentSize = bindings.getCurrentVectorSize();
+          final boolean[] nulls = inputEval.getNullVector();
+          for (int i = 0; i < currentSize; i++) {
+            if (nulls != null && nulls[i]) {
+              outputValues[i] = 1L;
+            } else {
+              outputValues[i] = 0L;
+            }
+          }
+          return new ExprEvalLongVector(outputValues, null);
+        }
+
+        @Override
+        public ExprType getOutputType()
+        {
+          return ExprType.LONG;
+        }
+      };
+    } else if (ExprType.DOUBLE == type) {
+      final ExprVectorProcessor<double[]> input = expr.buildVectorized(inspector);
+      processor = new ExprVectorProcessor<long[]>()
+      {
+        @Override
+        public ExprEvalVector<long[]> evalVector(Expr.VectorInputBinding bindings)
+        {
+          final ExprEvalVector<double[]> inputEval = input.evalVector(bindings);
+
+          final int currentSize = bindings.getCurrentVectorSize();
+          final boolean[] nulls = inputEval.getNullVector();
+          for (int i = 0; i < currentSize; i++) {
+            if (nulls != null && nulls[i]) {
+              outputValues[i] = 1L;
+            } else {
+              outputValues[i] = 0L;
+            }
+          }
+          return new ExprEvalLongVector(outputValues, null);
+        }
+
+        @Override
+        public ExprType getOutputType()
+        {
+          return ExprType.LONG;
+        }
+      };
+    }
+
+    if (processor == null) {
+      throw Exprs.cannotVectorize();
+    }
+    return (ExprVectorProcessor<T>) processor;
+  }
+
+  public static <T> ExprVectorProcessor<T> isNotNull(Expr.VectorInputBindingInspector inspector, Expr expr)
+  {
+
+    final ExprType type = expr.getOutputType(inspector);
+    if (type == null) {
+      return constant(0L, inspector.getMaxVectorSize());
+    }
+
+    final long[] outputValues = new long[inspector.getMaxVectorSize()];
+
+    ExprVectorProcessor<?> processor = null;
+    if (ExprType.STRING == type) {
+      final ExprVectorProcessor<String[]> input = expr.buildVectorized(inspector);
+      processor = new ExprVectorProcessor<long[]>()
+      {
+        @Override
+        public ExprEvalVector<long[]> evalVector(Expr.VectorInputBinding bindings)
+        {
+          final ExprEvalVector<String[]> inputEval = input.evalVector(bindings);
+
+          final int currentSize = bindings.getCurrentVectorSize();
+          final String[] values = inputEval.values();
+          for (int i = 0; i < currentSize; i++) {
+            if (values[i] == null) {
+              outputValues[i] = 0L;
+            } else {
+              outputValues[i] = 1L;
+            }
+          }
+          return new ExprEvalLongVector(outputValues, null);
+        }
+
+        @Override
+        public ExprType getOutputType()
+        {
+          return ExprType.LONG;
+        }
+      };
+    } else if (ExprType.LONG == type) {
+      final ExprVectorProcessor<long[]> input = expr.buildVectorized(inspector);
+      processor = new ExprVectorProcessor<long[]>()
+      {
+        @Override
+        public ExprEvalVector<long[]> evalVector(Expr.VectorInputBinding bindings)
+        {
+          final ExprEvalVector<long[]> inputEval = input.evalVector(bindings);
+
+          final int currentSize = bindings.getCurrentVectorSize();
+          final boolean[] nulls = inputEval.getNullVector();
+          for (int i = 0; i < currentSize; i++) {
+            if (nulls != null && nulls[i]) {
+              outputValues[i] = 0L;
+            } else {
+              outputValues[i] = 1L;
+            }
+          }
+          return new ExprEvalLongVector(outputValues, null);
+        }
+
+        @Override
+        public ExprType getOutputType()
+        {
+          return ExprType.LONG;
+        }
+      };
+    } else if (ExprType.DOUBLE == type) {
+      final ExprVectorProcessor<double[]> input = expr.buildVectorized(inspector);
+      processor = new ExprVectorProcessor<long[]>()
+      {
+        @Override
+        public ExprEvalVector<long[]> evalVector(Expr.VectorInputBinding bindings)
+        {
+          final ExprEvalVector<double[]> inputEval = input.evalVector(bindings);
+
+          final int currentSize = bindings.getCurrentVectorSize();
+          final boolean[] nulls = inputEval.getNullVector();
+          for (int i = 0; i < currentSize; i++) {
+            if (nulls != null && nulls[i]) {
+              outputValues[i] = 0L;
+            } else {
+              outputValues[i] = 1L;
+            }
+          }
+          return new ExprEvalLongVector(outputValues, null);
+        }
+
+        @Override
+        public ExprType getOutputType()
+        {
+          return ExprType.LONG;
+        }
+      };
+    }
+
+    if (processor == null) {
+      throw Exprs.cannotVectorize();
+    }
+    return (ExprVectorProcessor<T>) processor;
+  }
+
+  public static <T> ExprVectorProcessor<T> nvl(Expr.VectorInputBindingInspector inspector, Expr left, Expr right)
+  {
+    final int maxVectorSize = inspector.getMaxVectorSize();
+
+    return makeSymmetricalProcessor(
+        inspector,
+        left,
+        right,
+        () -> new SymmetricalBivariateFunctionVectorProcessor<long[]>(
+            ExprType.LONG,
+            left.buildVectorized(inspector),
+            right.buildVectorized(inspector)
+        )
+        {
+          final long[] output = new long[maxVectorSize];
+          final boolean[] outputNulls = new boolean[maxVectorSize];
+
+          @Override
+          public void processIndex(
+              long[] leftInput,
+              @Nullable boolean[] leftNulls,
+              long[] rightInput,
+              @Nullable boolean[] rightNulls,
+              int i
+          )
+          {
+            if (leftNulls != null && leftNulls[i]) {
+              if (rightNulls != null) {
+                output[i] = rightNulls[i] ? 0L : rightInput[i];
+                outputNulls[i] = rightNulls[i];
+              } else {
+                output[i] = rightInput[i];
+              }
+            } else {
+              output[i] = leftInput[i];
+            }
+          }
+
+          @Override
+          public ExprEvalVector<long[]> asEval()
+          {
+            return new ExprEvalLongVector(output, outputNulls);
+          }
+        },
+        () -> new SymmetricalBivariateFunctionVectorProcessor<double[]>(
+            ExprType.DOUBLE,
+            left.buildVectorized(inspector),
+            right.buildVectorized(inspector)
+        )
+        {
+          final double[] output = new double[maxVectorSize];
+          final boolean[] outputNulls = new boolean[maxVectorSize];
+
+          @Override
+          public void processIndex(
+              double[] leftInput,
+              @Nullable boolean[] leftNulls,
+              double[] rightInput,
+              @Nullable boolean[] rightNulls,
+              int i
+          )
+          {
+            if (leftNulls != null && leftNulls[i]) {
+              if (rightNulls != null) {
+                output[i] = rightNulls[i] ? 0.0 : rightInput[i];
+                outputNulls[i] = rightNulls[i];
+              } else {
+                output[i] = rightInput[i];
+              }
+            } else {
+              output[i] = leftInput[i];
+            }
+          }
+
+          @Override
+          public ExprEvalVector<double[]> asEval()
+          {
+            return new ExprEvalDoubleVector(output, outputNulls);
+          }
+        },
+        () -> new SymmetricalBivariateFunctionVectorProcessor<String[]>(
+            ExprType.STRING,
+            left.buildVectorized(inspector),
+            right.buildVectorized(inspector)
+        )
+        {
+          final String[] output = new String[maxVectorSize];
+
+          @Override
+          public void processIndex(
+              String[] leftInput,
+              @Nullable boolean[] leftNulls,
+              String[] rightInput,
+              @Nullable boolean[] rightNulls,
+              int i
+          )
+          {
+            output[i] = leftInput[i] != null ? leftInput[i] : rightInput[i];
+          }
+
+          @Override
+          public ExprEvalVector<String[]> asEval()
+          {
+            return new ExprEvalStringVector(output);
+          }
+        }
+    );
+  }
+
+  public static <T> ExprVectorProcessor<T> not(Expr.VectorInputBindingInspector inspector, Expr expr)
+  {
+    final ExprType inputType = expr.getOutputType(inspector);
+    final int maxVectorSize = inspector.getMaxVectorSize();
+    ExprVectorProcessor<?> processor = null;
+    if (ExprType.STRING.equals(inputType)) {
+      processor = new LongOutStringInFunctionVectorProcessor(expr.buildVectorized(inspector), maxVectorSize)
+      {
+        @Override
+        public void processIndex(String[] strings, long[] longs, boolean[] outputNulls, int i)
+        {
+          outputNulls[i] = strings[i] == null;
+          if (!outputNulls[i]) {
+            longs[i] = Evals.asLong(!Evals.asBoolean(strings[i]));
+          }
+        }
+      };
+    } else if (ExprType.LONG.equals(inputType)) {
+      processor = new LongOutLongInFunctionVectorValueProcessor(expr.buildVectorized(inspector), maxVectorSize)
+      {
+        @Override
+        public long apply(long input)
+        {
+          return Evals.asLong(!Evals.asBoolean(input));
+        }
+      };
+    } else if (ExprType.DOUBLE.equals(inputType)) {
+      processor = new DoubleOutDoubleInFunctionVectorValueProcessor(expr.buildVectorized(inspector), maxVectorSize)
+      {
+        @Override
+        public double apply(double input)
+        {
+          return Evals.asDouble(!Evals.asBoolean(input));
+        }
+      };
+    }
+    if (processor == null) {
+      throw Exprs.cannotVectorize();
+    }
+    return (ExprVectorProcessor<T>) processor;
+  }
+
+  public static <T> ExprVectorProcessor<T> or(Expr.VectorInputBindingInspector inspector, Expr left, Expr right)
+  {
+    final int maxVectorSize = inspector.getMaxVectorSize();
+    return makeSymmetricalProcessor(
+        inspector,
+        left,
+        right,
+        () -> new SymmetricalBivariateFunctionVectorProcessor<long[]>(
+            ExprType.LONG,
+            left.buildVectorized(inspector),
+            right.buildVectorized(inspector)
+        )
+        {
+          final long[] output = new long[maxVectorSize];
+          final boolean[] outputNulls = new boolean[maxVectorSize];
+
+          @Override
+          public void processIndex(
+              long[] leftInput,
+              @Nullable boolean[] leftNulls,
+              long[] rightInput,
+              @Nullable boolean[] rightNulls,
+              int i
+          )
+          {
+            if (NullHandling.sqlCompatible()) {
+              // true/null, null/true, null/null -> true
+              // false/null, null/false -> null

Review comment:
       It seems like this comment doesn't match the implementation
   
   `null /null -> null` which is what's described in `docs/misc/math-expr.md` but this comment says it should be true.

##########
File path: docs/misc/math-expr.md
##########
@@ -254,7 +256,34 @@ supported features:
 * constants and identifiers are supported for any column type
 * `cast` is supported for numeric and string types
 * math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types
-* comparison operators: `=`, `!=`, `>`, `>=`, `<`, `<=` are supported for numeric types
+* logical operators: `!`, `&&`, `||`, are supported for string and numeric types
+* comparison operators: `=`, `!=`, `>`, `>=`, `<`, `<=` are supported for string and numeric types
 * math functions: `abs`, `acos`, `asin`, `atan`, `cbrt`, `ceil`, `cos`, `cosh`, `cot`, `exp`, `expm1`, `floor`, `getExponent`, `log`, `log10`, `log1p`, `nextUp`, `rint`, `signum`, `sin`, `sinh`, `sqrt`, `tan`, `tanh`, `toDegrees`, `toRadians`, `ulp`, `atan2`, `copySign`, `div`, `hypot`, `max`, `min`, `nextAfter`,  `pow`, `remainder`, `scalb` are supported for numeric types
 * time functions: `timestamp_floor` (with constant granularity argument) is supported for numeric types
+* boolean functions: `isnull`, `notnull` are supported for string and numeric types
+* conditional functions: `nvl` is supported for string and numeric types
+* string functions: the concatenation operator (`+`) and `concat` function are supported for string and numeric types
 * other: `parse_long` is supported for numeric and string types
+
+## Legacy logical operator mode
+In earlier releases of Druid, the logical 'and' and 'or' operators behaved in a non-standard manner, but this behavior has been changed so that these operations output 'homogeneous' boolean values.

Review comment:
       nit: Perhaps we can be more specific about which version.
   ```suggestion
   Prior to the 0.22 release of Apache Druid, the logical 'and' and 'or' operators behaved in a non-standard manner, but this behavior has been changed so that these operations output 'homogeneous' boolean values.
   ```

##########
File path: core/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java
##########
@@ -135,6 +178,660 @@ public void processIndex(String[] strings, long[] longs, boolean[] outputNulls,
     return (ExprVectorProcessor<T>) processor;
   }
 
+  public static <T> ExprVectorProcessor<T> isNull(Expr.VectorInputBindingInspector inspector, Expr expr)
+  {
+
+    final ExprType type = expr.getOutputType(inspector);
+
+    if (type == null) {
+      return constant(1L, inspector.getMaxVectorSize());
+    }
+    final long[] outputValues = new long[inspector.getMaxVectorSize()];
+
+    ExprVectorProcessor<?> processor = null;
+    if (ExprType.STRING == type) {
+      final ExprVectorProcessor<String[]> input = expr.buildVectorized(inspector);
+      processor = new ExprVectorProcessor<long[]>()
+      {
+        @Override
+        public ExprEvalVector<long[]> evalVector(Expr.VectorInputBinding bindings)
+        {
+          final ExprEvalVector<String[]> inputEval = input.evalVector(bindings);
+
+          final int currentSize = bindings.getCurrentVectorSize();
+          final String[] values = inputEval.values();
+          for (int i = 0; i < currentSize; i++) {
+            if (values[i] == null) {
+              outputValues[i] = 1L;
+            } else {
+              outputValues[i] = 0L;
+            }
+          }
+          return new ExprEvalLongVector(outputValues, null);
+        }
+
+        @Override
+        public ExprType getOutputType()
+        {
+          return ExprType.LONG;
+        }
+      };
+    } else if (ExprType.LONG == type) {
+      final ExprVectorProcessor<long[]> input = expr.buildVectorized(inspector);
+      processor = new ExprVectorProcessor<long[]>()
+      {
+        @Override
+        public ExprEvalVector<long[]> evalVector(Expr.VectorInputBinding bindings)
+        {
+          final ExprEvalVector<long[]> inputEval = input.evalVector(bindings);
+
+          final int currentSize = bindings.getCurrentVectorSize();
+          final boolean[] nulls = inputEval.getNullVector();
+          for (int i = 0; i < currentSize; i++) {
+            if (nulls != null && nulls[i]) {

Review comment:
       nit: is the optimizer smart enough to optimize this to
   ```suggestion
             for (int i = 0; nulls !=null && i < currentSize; i++) {
               if (nulls[i]) {
   ```
   
   since if `nulls == null` we don't have to iterate over the list because long arrays default to 0L iirc.
   Similar comment elsewhere this pattern is used




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #11184:
URL: https://github.com/apache/druid/pull/11184#issuecomment-977205745


   Thanks for considering the suggestion to be more compatible. I think it's good that the bundled configs set it to false, and I agree that the release notes and docs should push people towards considering setting it to false for existing deployments.
   
   IMO, it'd make sense to switch this behavior and a few others (like SQL compatible null handling) at the same time in a future release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #11184:
URL: https://github.com/apache/druid/pull/11184#issuecomment-982059760


   @clintropolis, the `NULL` handling proposed is standard SQL trinary logic, so +1 on that change.
   
   SQL also applies NULL handling to all functions and operators: in general, `f(NULL) -> NULL` for every function `f`, unless special cased, as in the logical operators.
   
   Does the change also cover the negation operator? `NOT a` where `a` is `NULL` results in `NULL`. (Note that `a IS NOT NULL` is entirely different!)
   
   Did a prior (or will a future) change handle nulls in math operators? `10 + NULL -> NULL`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a change in pull request #11184: vectorize logical operators and boolean functions

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #11184:
URL: https://github.com/apache/druid/pull/11184#discussion_r758771121



##########
File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##########
@@ -385,7 +322,59 @@ protected BinaryOpExprBase copy(Expr left, Expr right)
   public ExprEval eval(ObjectBinding bindings)
   {
     ExprEval leftVal = left.eval(bindings);
-    return leftVal.asBoolean() ? right.eval(bindings) : leftVal;
+    if (!ExpressionProcessing.useStrictBooleans()) {
+      return leftVal.asBoolean() ? right.eval(bindings) : leftVal;
+    }
+
+    // if left is false, always false
+    if (leftVal.value() != null && !leftVal.asBoolean()) {
+      return ExprEval.ofLongBoolean(false);

Review comment:
       This looks like a constant. Can it be defined as such? `static final ExprEval.FALSE = ExprEval.ofLongBoolean(false)` to avoid recomputing the value in an inner loop? Here and below.

##########
File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##########
@@ -385,7 +322,59 @@ protected BinaryOpExprBase copy(Expr left, Expr right)
   public ExprEval eval(ObjectBinding bindings)
   {
     ExprEval leftVal = left.eval(bindings);
-    return leftVal.asBoolean() ? right.eval(bindings) : leftVal;
+    if (!ExpressionProcessing.useStrictBooleans()) {
+      return leftVal.asBoolean() ? right.eval(bindings) : leftVal;
+    }
+
+    // if left is false, always false
+    if (leftVal.value() != null && !leftVal.asBoolean()) {
+      return ExprEval.ofLongBoolean(false);
+    }
+    ExprEval rightVal;
+    if (NullHandling.sqlCompatible() || Types.is(leftVal.type(), ExprType.STRING)) {

Review comment:
       Same comment, can the `NullHandling.sqlCompatible()` be cached?
   
   Better, since the value never changes, can there be separate implementations for the two cases so we pick the one we want up front and never have to check again? ("Up-front" would be the start of this part of a query, if the value can change per query.)
   
   Picking the exact right code is not quite code gen, but is pretty close in terms of performance.

##########
File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##########
@@ -385,7 +322,59 @@ protected BinaryOpExprBase copy(Expr left, Expr right)
   public ExprEval eval(ObjectBinding bindings)
   {
     ExprEval leftVal = left.eval(bindings);
-    return leftVal.asBoolean() ? right.eval(bindings) : leftVal;
+    if (!ExpressionProcessing.useStrictBooleans()) {

Review comment:
       This is an inner loop. Can the `!ExpressionProcessing.useStrictBooleans()` be evaluated on setup and reused here rather than making this call every time? The value can't change. Maybe the JVM will inline the call, but better to just eliminate it in the inner-loop code path.

##########
File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
##########
@@ -213,20 +220,69 @@ default boolean areNumeric(List<Expr> args)
     }
 
     /**
-     * Check if all provided {@link Expr} can infer the output type as {@link ExpressionType#isNumeric} with a value of true.
+     * Check if all provided {@link Expr} can infer the output type as {@link ExpressionType#isNumeric} with a value
+     * of true (or null, which is not a type)
      *
      * There must be at least one expression with a computable numeric output type for this method to return true.
+     *
+     * This method should only be used if {@link #getType} produces accurate information for all bindings (no null
+     * value for type unless the input binding does not exist and so the input is always null)
+     *
+     * @see #getOutputType(InputBindingInspector)
      */
     default boolean areNumeric(Expr... args)
     {
       return areNumeric(Arrays.asList(args));
     }
 
     /**
-     * Check if all provided {@link Expr} can infer the output type as {@link ExpressionType#isPrimitive()} (non-array) with a
-     * value of true.
+     * Check if all arguments are the same type (or null, which is not a type)
+     *
+     * This method should only be used if {@link #getType} produces accurate information for all bindings (no null
+     * value for type unless the input binding does not exist and so the input is always null)
+     *
+     * @see #getOutputType(InputBindingInspector)
+     */
+    default boolean areSameTypes(List<Expr> args)
+    {
+      ExpressionType currentType = null;
+      boolean allSame = true;
+      for (Expr arg : args) {
+        ExpressionType argType = arg.getOutputType(this);
+        if (argType == null) {
+          continue;
+        }
+        if (currentType == null) {
+          currentType = argType;
+        }
+        allSame &= Objects.equals(argType, currentType);

Review comment:
       Simpler to do a short-circuit AND:
   
   ```java
       for ... {
         if (!Objects.equals(argType, currentType)) {
           return false;
         }
       }
       return true;
   ```

##########
File path: core/src/main/java/org/apache/druid/math/expr/BinaryOperatorExpr.java
##########
@@ -157,3 +158,67 @@ protected ExprEval evalString(@Nullable String left, @Nullable String right)
 
   protected abstract double evalDouble(double left, double right);
 }
+
+@SuppressWarnings("ClassName")
+abstract class BinaryBooleanOpExprBase extends BinaryOpExprBase
+{
+  BinaryBooleanOpExprBase(String op, Expr left, Expr right)
+  {
+    super(op, left, right);
+  }
+
+  @Override
+  public ExprEval eval(ObjectBinding bindings)
+  {
+    ExprEval leftVal = left.eval(bindings);
+    ExprEval rightVal = right.eval(bindings);
+
+    // Result of any Binary expressions is null if any of the argument is null.
+    // e.g "select null * 2 as c;" or "select null + 1 as c;" will return null as per Standard SQL spec.
+    if (NullHandling.sqlCompatible() && (leftVal.value() == null || rightVal.value() == null)) {
+      return ExprEval.of(null);
+    }
+
+    ExpressionType type = ExpressionTypeConversion.autoDetect(leftVal, rightVal);
+    boolean result;
+    switch (type.getType()) {

Review comment:
       Even better is to have different implementations for each type so the switch is done at setup time, not in the inner loop.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org