You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/22 19:59:22 UTC

[GitHub] [druid] gianm commented on a change in pull request #11184: vectorize logical operators and boolean functions

gianm commented on a change in pull request #11184:
URL: https://github.com/apache/druid/pull/11184#discussion_r754029048



##########
File path: core/src/main/java/org/apache/druid/math/expr/ExpressionProcessing.java
##########
@@ -48,11 +48,17 @@
   @VisibleForTesting
   public static void initializeForTests(@Nullable Boolean allowNestedArrays)
   {
-    INSTANCE = new ExpressionProcessingConfig(allowNestedArrays);
+    INSTANCE = new ExpressionProcessingConfig(allowNestedArrays, null);
+  }
+
+  @VisibleForTesting
+  public static void initializeForLegacyLogicalOperationsTests(boolean useLegacy)
+  {
+    INSTANCE = new ExpressionProcessingConfig(null, useLegacy);
   }
 
   /**
-   * whether nulls should be replaced with default value.
+   * [['is expression support for'],['nested arrays'],['enabled?']]

Review comment:
       🥸

##########
File path: core/src/main/java/org/apache/druid/math/expr/vector/BivariateFunctionVectorProcessor.java
##########
@@ -20,68 +20,63 @@
 package org.apache.druid.math.expr.vector;
 
 import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExpressionType;
+
+import javax.annotation.Nullable;
 
 /**
- * common machinery for processing two input operators and functions, which should always treat null inputs as null
- * output, and are backed by a primitive values instead of an object values (and need to use the null vectors instead of
- * checking the vector themselves for nulls)
+ * Basic vector processor that processes 2 inputs and works for both primtive value vectors and object vectors.

Review comment:
       primitive (spelling)

##########
File path: docs/misc/math-expr.md
##########
@@ -256,16 +266,42 @@ supported features:
 * constants and identifiers are supported for any column type
 * `cast` is supported for numeric and string types
 * math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types
-* comparison operators: `=`, `!=`, `>`, `>=`, `<`, `<=` are supported for numeric types
+* logical operators: `!`, `&&`, `||`, are supported for string and numeric types
+* comparison operators: `=`, `!=`, `>`, `>=`, `<`, `<=` are supported for string and numeric types
 * math functions: `abs`, `acos`, `asin`, `atan`, `cbrt`, `ceil`, `cos`, `cosh`, `cot`, `exp`, `expm1`, `floor`, `getExponent`, `log`, `log10`, `log1p`, `nextUp`, `rint`, `signum`, `sin`, `sinh`, `sqrt`, `tan`, `tanh`, `toDegrees`, `toRadians`, `ulp`, `atan2`, `copySign`, `div`, `hypot`, `max`, `min`, `nextAfter`,  `pow`, `remainder`, `scalb` are supported for numeric types
 * time functions: `timestamp_floor` (with constant granularity argument) is supported for numeric types
+* boolean functions: `isnull`, `notnull` are supported for string and numeric types
+* conditional functions: `nvl` is supported for string and numeric types
+* string functions: the concatenation operator (`+`) and `concat` function are supported for string and numeric types
 * other: `parse_long` is supported for numeric and string types
 
+## Legacy logical operator mode
+Prior to the 0.23 release of Apache Druid, the logical 'and' and 'or' operators behaved in a non-standard manner, but this behavior has been changed so that these operations output 'homogeneous' boolean values - `LONG` values of `0` for `true` and `1` for `false`. Druid currently still retains implicit conversion of `LONG`, `DOUBLE`, and `STRING` types into boolean values: 
+* `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false`
+* `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`. 
 
-## Other functions
+Old behavior:
+* `100 && 11` -> `11`
+* `0.7 || 0.3` -> `0.3`
+* `100 && 0` -> `0`
+* `'troo' && 'true'` -> `'troo'`
+* `'troo' || 'true'` -> `'true'`
 
-| function | description |
-| --- | --- |
-| human_readable_binary_byte_format(value[, precision]) | Format a number in human-readable [IEC](https://en.wikipedia.org/wiki/Binary_prefix) format. `precision` must be in the range of [0,3] (default: 2). For example:<li> human_readable_binary_byte_format(1048576) returns `1.00 MiB`</li><li>human_readable_binary_byte_format(1048576, 3) returns `1.000 MiB`</li> |
-| human_readable_decimal_byte_format(value[, precision]) | Format a number in human-readable [SI](https://en.wikipedia.org/wiki/Binary_prefix) format. `precision` must be in the range of [0,3] (default: 2). For example:<li> human_readable_decimal_byte_format(1000000) returns `1.00 MB`</li><li>human_readable_decimal_byte_format(1000000, 3) returns `1.000 MB`</li> |
-| human_readable_decimal_format(value[, precision]) | Format a number in human-readable SI format. `precision` must be in the range of [0,3] (default: 2). For example:<li>human_readable_decimal_format(1000000) returns `1.00 M`</li><li>human_readable_decimal_format(1000000, 3) returns `1.000 M`</li>  |
+Current behavior:
+* `100 && 11` -> `1`
+* `0.7 || 0.3` -> `1`
+* `100 && 0` -> `0`
+* `'troo' && 'true'` -> `0`
+* `'troo' || 'true'` -> `1`
+
+Additionally, the logical operators in these older versions did not honor SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`). The current version treats `null` values as "unknown".
+
+For the "or" operator:
+* `true || null`, `null || true`, -> `1`
+* `false || null`, `null || false`, `null || null`-> `null`
+
+For the "and" operator:
+* `true && null`, `null && true`, `null && null` -> `null`
+* `false && null`, `null && false` -> `0`
+
+To revert to the behavior of previous Druid versions, `druid.expressions.useLegacyLogicalOperators` can be set to `true` in your Druid configuration.

Review comment:
       This should also go in `configuration/index.md`, so all the configuration things are in one place.

##########
File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
##########
@@ -222,6 +222,25 @@ default boolean areNumeric(Expr... args)
       return areNumeric(Arrays.asList(args));
     }
 
+    default boolean areSameTypes(List<Expr> args)
+    {
+      ExprType currentType = null;
+      boolean allSame = true;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (currentType == null) {
+          currentType = argType;
+        }
+        allSame &= argType == currentType;
+      }
+      return allSame;

Review comment:
       As I read it, the intent is more like `areDefinitelyTheSameType` (i.e. it should return false if it cannot prove that the exprs are all the same type).
   
   @clintropolis if that's correct, clarifying javadoc would be useful.

##########
File path: core/src/main/java/org/apache/druid/math/expr/ExpressionProcessingConfig.java
##########
@@ -27,20 +27,39 @@
 public class ExpressionProcessingConfig
 {
   public static final String NESTED_ARRAYS_CONFIG_STRING = "druid.expressions.allowNestedArrays";
+  public static final String NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING = "druid.expressions.useLegacyLogicalOperators";
 
   @JsonProperty("allowNestedArrays")
   private final boolean allowNestedArrays;
 
+  @JsonProperty("useLegacyLogicalOperators")
+  private final boolean useLegacyLogicalOperators;
+
   @JsonCreator
-  public ExpressionProcessingConfig(@JsonProperty("allowNestedArrays") @Nullable Boolean allowNestedArrays)
+  public ExpressionProcessingConfig(
+      @JsonProperty("allowNestedArrays") @Nullable Boolean allowNestedArrays,
+      @JsonProperty("useLegacyLogicalOperators") @Nullable Boolean useLegacyLogicalOperators
+  )
   {
     this.allowNestedArrays = allowNestedArrays == null
                              ? Boolean.valueOf(System.getProperty(NESTED_ARRAYS_CONFIG_STRING, "false"))
                              : allowNestedArrays;
+    if (useLegacyLogicalOperators == null) {
+      this.useLegacyLogicalOperators = Boolean.parseBoolean(
+          System.getProperty(NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING, "false")

Review comment:
       Can we set the default to "true" for some time? (to minimize sudden disruption)

##########
File path: core/src/main/java/org/apache/druid/math/expr/ExpressionProcessing.java
##########
@@ -65,4 +71,14 @@ public static boolean allowNestedArrays()
     }
     return INSTANCE.allowNestedArrays();
   }
+
+
+  public static boolean useLegacyLogicalOperators()
+  {
+    // this should only be null in a unit test context, in production this will be injected by the null handling module
+    if (INSTANCE == null) {
+      throw new IllegalStateException("NullHandling module not initialized, call NullHandling.initializeForTests()");

Review comment:
       Module name should be ExpressionProcessing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org