You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/02/08 11:55:48 UTC

[GitHub] [druid] clintropolis opened a new pull request #12241: sql expressions and virtual columns

clintropolis opened a new pull request #12241:
URL: https://github.com/apache/druid/pull/12241


   ### Description
   
   This PR reworks how the SQL planner constructs `DruidExpression` and how it translates them into `VirtualColumn` when constructing the native query.
   
   Prior to the changes in this PR, `DruidExpression` were flattened string expressions, eagerly constructed into a `VirtualColumn` via the `VirtualColumnRegistry`, whenever required for use in projections, aggregations, filters, etc. This meant that basically all of the expression structure was lost after converting to the string expression, making it impossible to modify them to try to optimize the query prior to conversion the the native JSON query.
   
   Now, `DruidExpression` retains the expression tree structure, each containing a `List<DruidExpression>` that contains any sub-expressions which at the time of virtual column creation is built into the string expression, using a newly added interface
   ```
     public interface ExpressionBuilder
     {
       String buildExpression(List<DruidExpression> arguments);
     }
   ```
   which is added to the `DruidExpression` when it is created. Another new interface has been added to aid in rewriting `DruidExpression` trees before planning into a native `Query`
   
   ```
     public interface DruidExpressionShuttle
     {
       DruidExpression visit(DruidExpression expression);
   
       default List<DruidExpression> visitAll(List<DruidExpression> expressions)
       {
         List<DruidExpression> list = new ArrayList<>(expressions.size());
         for (DruidExpression expr : expressions) {
           list.add(visit(expr));
         }
         return list;
       }
     }
   ```
   
   (along with a `visit` method on `DruidExpression` to accept the shuttle).
   
   `DruidExpression` are also now annotated, with one of 4 roles, `LITERAL`, `IDENTIFIER`, `EXPRESSION`, and `SPECIALIZED`, the last of which, coupled with the shuttle visitor, is the primary motivator of this PR.
   
   Using `MV_FILTER_ONLY`/`MV_FILTER_NONE` as an example, which has a "native" specialized `VirtualColumn` implementation, any expression which supplies such a specialized implementation will now always use it.
   
   Prior to this PR, when used in some composition, e.g.
   
   ```
   SELECT MV_LENGTH(MV_FILTER_ONLY(dim3, ARRAY['b'])), SUM(cnt) FROM druid.numfoo GROUP BY 1 ORDER BY 2 DESC
   ```
   
   this function would use a fall-back expression
   ```
   "virtualColumns" : [ {
       "type" : "expression",
       "name" : "v0",
       "expression" : "array_length(filter((x) -> array_contains(array('b'), x), "dim3"))",
       "outputType" : "LONG"
     } ],
   ```
   
   but now plans to this instead
   ```
   "virtualColumns" : [ {
       "type" : "expression",
       "name" : "v0",
       "expression" : "array_length(\"v1\")",
       "outputType" : "LONG"
     }, {
       "type" : "mv-filtered",
       "name" : "v1",
       "delegate" : {
         "type" : "default",
         "dimension" : "dim3",
         "outputName" : "dim3",
         "outputType" : "STRING"
       },
       "values" : [ "b" ],
       "isAllowList" : true
     } ],
   ```
   
   <hr>
   
   In the process, I uncovered additional bugs with multi-value string array expressions, similar to the bugs fixed #12078 (and controlled by the flag #12210)
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #12241:
URL: https://github.com/apache/druid/pull/12241#issuecomment-1033351418


   > Did you consider something that is an actual expression tree?
   
   the string fragments generated here do later turn into `Expr`, which is the AST of native expressions, however that is a bit more work than I was trying to bite off at this time, and we lack JSON serialization and such to be able to send these expressions parsed over the wire.
   
   It is probably worth considering doing this at some point, but for now I focused on changing as little as possible to allow the transformations I needed, which is just to swap complete operations. Additionally, `Expr` is missing a lot of information that we might want to decorate it with if we were using it in this manner, so we would need some additional adjustments to use like this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r802302489



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java
##########
@@ -32,127 +32,303 @@
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
 
 import javax.annotation.Nullable;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Objects;
 import java.util.function.Function;
 
 /**
+ * todo(clint): rewrite javadocs
+ *
  * Represents two kinds of expression-like concepts that native Druid queries support:
  *
  * (1) SimpleExtractions, which are direct column access, possibly with an extractionFn
  * (2) native Druid expressions.
  */
 public class DruidExpression
 {
+  public enum NodeType
+  {
+    /**
+     * constant value
+     */
+    LITERAL,
+    /**
+     * Identifier for a direct physical or virtual column access (column name or virtual column name)
+     */
+    IDENTIFIER,
+    /**
+     * Standard native druid expression, which can compute a string that can be parsed into {@link Expr}, or used
+     * as an {@link ExpressionVirtualColumn}
+     */
+    EXPRESSION,
+    /**
+     * Expression backed by a specialized {@link VirtualColumn}, which might provide more optimized evaluation than
+     * is possible with the standard
+     */
+    SPECIALIZED
+  }
+
   // Must be sorted
   private static final char[] SAFE_CHARS = " ,._-;:(){}[]<>!@#$%^&*`~?/".toCharArray();
+  private static final VirtualColumnBuilder DEFAULT_VIRTUAL_COLUMN_BUILDER = new ExpressionVirtualColumnBuilder();

Review comment:
       no reason, just seemed wasteful because is stateless

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java
##########
@@ -32,127 +32,303 @@
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
 
 import javax.annotation.Nullable;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Objects;
 import java.util.function.Function;
 
 /**
+ * todo(clint): rewrite javadocs

Review comment:
       oops :+1:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
imply-cheddar commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r802316191



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/VirtualColumnRegistry.java
##########
@@ -41,29 +44,33 @@
  */
 public class VirtualColumnRegistry
 {
+  private final ExprMacroTable macroTable;
   private final RowSignature baseRowSignature;
-  private final Map<ExpressionWrapper, VirtualColumn> virtualColumnsByExpression;
-  private final Map<String, VirtualColumn> virtualColumnsByName;
+  private final Map<ExpressionAndTypeHint, String> virtualColumnsByExpression;
+  private final Map<String, ExpressionAndTypeHint> virtualColumnsByName;
   private final String virtualColumnPrefix;
   private int virtualColumnCounter;
 
   private VirtualColumnRegistry(
       RowSignature baseRowSignature,
+      ExprMacroTable macroTable,
       String virtualColumnPrefix,
-      Map<ExpressionWrapper, VirtualColumn> virtualColumnsByExpression,
-      Map<String, VirtualColumn> virtualColumnsByName
+      Map<ExpressionAndTypeHint, String> virtualColumnsByExpression,
+      Map<String, ExpressionAndTypeHint> virtualColumnsByName
   )
   {
+    this.macroTable = macroTable;
     this.baseRowSignature = baseRowSignature;
     this.virtualColumnPrefix = virtualColumnPrefix;
     this.virtualColumnsByExpression = virtualColumnsByExpression;
     this.virtualColumnsByName = virtualColumnsByName;
   }
 
-  public static VirtualColumnRegistry create(final RowSignature rowSignature)

Review comment:
       Agree that it's pretty in the weeds and someone who actually depends on the implementation to this level is probably in deep enough to deal with the change.  Documenting seems reasonable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #12241: allow optimizing sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #12241:
URL: https://github.com/apache/druid/pull/12241#issuecomment-1034283148


   thanks for review @cheddar and @jihoonson. CI failure seems unrelated (and travis logs look pretty messed up), so I'm going to go ahead and merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #12241: allow optimizing sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #12241:
URL: https://github.com/apache/druid/pull/12241


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r802302285



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java
##########
@@ -32,127 +32,303 @@
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
 
 import javax.annotation.Nullable;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Objects;
 import java.util.function.Function;
 
 /**
+ * todo(clint): rewrite javadocs
+ *
  * Represents two kinds of expression-like concepts that native Druid queries support:
  *
  * (1) SimpleExtractions, which are direct column access, possibly with an extractionFn
  * (2) native Druid expressions.
  */
 public class DruidExpression
 {
+  public enum NodeType
+  {
+    /**
+     * constant value
+     */
+    LITERAL,
+    /**
+     * Identifier for a direct physical or virtual column access (column name or virtual column name)
+     */
+    IDENTIFIER,
+    /**
+     * Standard native druid expression, which can compute a string that can be parsed into {@link Expr}, or used
+     * as an {@link ExpressionVirtualColumn}
+     */
+    EXPRESSION,
+    /**
+     * Expression backed by a specialized {@link VirtualColumn}, which might provide more optimized evaluation than
+     * is possible with the standard
+     */
+    SPECIALIZED
+  }
+
   // Must be sorted
   private static final char[] SAFE_CHARS = " ,._-;:(){}[]<>!@#$%^&*`~?/".toCharArray();
+  private static final VirtualColumnBuilder DEFAULT_VIRTUAL_COLUMN_BUILDER = new ExpressionVirtualColumnBuilder();
 
   static {
     Arrays.sort(SAFE_CHARS);
   }
 
-  @Nullable
-  private final SimpleExtraction simpleExtraction;
-  private final String expression;
-  private final TrinaryFn<String, ColumnType, ExprMacroTable, VirtualColumn> virtualColumnFn;
+  private static String escape(final String s)
+  {
+    final StringBuilder escaped = new StringBuilder();
+    for (int i = 0; i < s.length(); i++) {
+      final char c = s.charAt(i);
+      if (Character.isLetterOrDigit(c) || Arrays.binarySearch(SAFE_CHARS, c) >= 0) {
+        escaped.append(c);
+      } else {
+        escaped.append("\\u").append(BaseEncoding.base16().encode(Chars.toByteArray(c)));
+      }
+    }
+    return escaped.toString();
+  }
 
-  private DruidExpression(@Nullable final SimpleExtraction simpleExtraction, final String expression, @Nullable final TrinaryFn<String, ColumnType, ExprMacroTable, VirtualColumn> virtualColumnFn)

Review comment:
       i think since it is private it should be ok? the static methods are what everything should be using to make these things afaik




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r802308167



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/VirtualColumnRegistry.java
##########
@@ -41,29 +44,33 @@
  */
 public class VirtualColumnRegistry
 {
+  private final ExprMacroTable macroTable;
   private final RowSignature baseRowSignature;
-  private final Map<ExpressionWrapper, VirtualColumn> virtualColumnsByExpression;
-  private final Map<String, VirtualColumn> virtualColumnsByName;
+  private final Map<ExpressionAndTypeHint, String> virtualColumnsByExpression;
+  private final Map<String, ExpressionAndTypeHint> virtualColumnsByName;
   private final String virtualColumnPrefix;
   private int virtualColumnCounter;
 
   private VirtualColumnRegistry(
       RowSignature baseRowSignature,
+      ExprMacroTable macroTable,
       String virtualColumnPrefix,
-      Map<ExpressionWrapper, VirtualColumn> virtualColumnsByExpression,
-      Map<String, VirtualColumn> virtualColumnsByName
+      Map<ExpressionAndTypeHint, String> virtualColumnsByExpression,
+      Map<String, ExpressionAndTypeHint> virtualColumnsByName
   )
   {
+    this.macroTable = macroTable;
     this.baseRowSignature = baseRowSignature;
     this.virtualColumnPrefix = virtualColumnPrefix;
     this.virtualColumnsByExpression = virtualColumnsByExpression;
     this.virtualColumnsByName = virtualColumnsByName;
   }
 
-  public static VirtualColumnRegistry create(final RowSignature rowSignature)

Review comment:
       hmm, i thought about this, but only a `DruidRel` of some sort should be creating a virtual column registry, since they are scoped to a native query, so this so it didn't seem like a huge deal unless someone is like super super super deep into custom SQL planning for custom query types (and not fully sure it is possible without modifying `druid-sql` itself at this point...)
   
   I can't really think of a clean way to add it back; I moved it here because the expression macro table should not be changing over the course of planning a query (or even planning multiple queries), so it seemed pointless that we were passing it in via planner context when we were eagerly creating the virtual columns (creating expression virtual columns requires a reference to the macro table), so we can't count on it being set in the constructor we would have to have an alternative way even for the new methods, which I guess would mean passing it in when creating an expression and storing a reference to it in each `DruidExpression` in the tree so that they could build their virtual columns.
   
   I think maybe we should just document this on the unlikely chance someone was doing something incredibly crazy with this, because in my mind the registry is a facility that the planner provides to operator conversions/filters/aggregators etc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r801898129



##########
File path: extensions-contrib/tdigestsketch/src/main/java/org/apache/druid/query/aggregation/tdigestsketch/TDigestSketchUtils.java
##########
@@ -88,16 +86,16 @@ public static boolean matchingAggregatorFactoryExists(
   {
     // Check input for equivalence.
     final boolean inputMatches;
-    final VirtualColumn virtualInput =
-        virtualColumnRegistry.findVirtualColumns(factory.requiredFields())
+    final DruidExpression virtualInput =

Review comment:
       this pattern is done in a few places, and i think could be done more efficiently, but I avoided changing the logic at this point to reduce the number of moving parts since this PR has become a bit large




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
imply-cheddar commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r802250747



##########
File path: extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/quantiles/sql/DoublesSketchSqlAggregatorTest.java
##########
@@ -564,7 +564,7 @@ public void testDoublesSketchPostAggs() throws Exception
                       ),
                       new ExpressionPostAggregator(
                           "p3",
-                          "(p2 + 1000)",
+                          "(\"p2\" + 1000)",

Review comment:
       Naive question, but is this test change indicative of a change in what SQL is required to be written?  I.e. might some things that didn't require quotes previously suddenly require quotes?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java
##########
@@ -32,127 +32,303 @@
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
 
 import javax.annotation.Nullable;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Objects;
 import java.util.function.Function;
 
 /**
+ * todo(clint): rewrite javadocs
+ *
  * Represents two kinds of expression-like concepts that native Druid queries support:
  *
  * (1) SimpleExtractions, which are direct column access, possibly with an extractionFn
  * (2) native Druid expressions.
  */
 public class DruidExpression
 {
+  public enum NodeType
+  {
+    /**
+     * constant value
+     */
+    LITERAL,
+    /**
+     * Identifier for a direct physical or virtual column access (column name or virtual column name)
+     */
+    IDENTIFIER,
+    /**
+     * Standard native druid expression, which can compute a string that can be parsed into {@link Expr}, or used
+     * as an {@link ExpressionVirtualColumn}
+     */
+    EXPRESSION,
+    /**
+     * Expression backed by a specialized {@link VirtualColumn}, which might provide more optimized evaluation than
+     * is possible with the standard
+     */
+    SPECIALIZED
+  }
+
   // Must be sorted
   private static final char[] SAFE_CHARS = " ,._-;:(){}[]<>!@#$%^&*`~?/".toCharArray();
+  private static final VirtualColumnBuilder DEFAULT_VIRTUAL_COLUMN_BUILDER = new ExpressionVirtualColumnBuilder();

Review comment:
       I'm scared of static buildery thingies.  Is there a reason that this needs to be static instead of just creating a new one each time it's used?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java
##########
@@ -32,127 +32,303 @@
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
 
 import javax.annotation.Nullable;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Objects;
 import java.util.function.Function;
 
 /**
+ * todo(clint): rewrite javadocs
+ *
  * Represents two kinds of expression-like concepts that native Druid queries support:
  *
  * (1) SimpleExtractions, which are direct column access, possibly with an extractionFn
  * (2) native Druid expressions.
  */
 public class DruidExpression
 {
+  public enum NodeType
+  {
+    /**
+     * constant value
+     */
+    LITERAL,
+    /**
+     * Identifier for a direct physical or virtual column access (column name or virtual column name)
+     */
+    IDENTIFIER,
+    /**
+     * Standard native druid expression, which can compute a string that can be parsed into {@link Expr}, or used
+     * as an {@link ExpressionVirtualColumn}
+     */
+    EXPRESSION,
+    /**
+     * Expression backed by a specialized {@link VirtualColumn}, which might provide more optimized evaluation than
+     * is possible with the standard
+     */
+    SPECIALIZED
+  }
+
   // Must be sorted
   private static final char[] SAFE_CHARS = " ,._-;:(){}[]<>!@#$%^&*`~?/".toCharArray();
+  private static final VirtualColumnBuilder DEFAULT_VIRTUAL_COLUMN_BUILDER = new ExpressionVirtualColumnBuilder();
 
   static {
     Arrays.sort(SAFE_CHARS);
   }
 
-  @Nullable
-  private final SimpleExtraction simpleExtraction;
-  private final String expression;
-  private final TrinaryFn<String, ColumnType, ExprMacroTable, VirtualColumn> virtualColumnFn;
+  private static String escape(final String s)
+  {
+    final StringBuilder escaped = new StringBuilder();
+    for (int i = 0; i < s.length(); i++) {
+      final char c = s.charAt(i);
+      if (Character.isLetterOrDigit(c) || Arrays.binarySearch(SAFE_CHARS, c) >= 0) {
+        escaped.append(c);
+      } else {
+        escaped.append("\\u").append(BaseEncoding.base16().encode(Chars.toByteArray(c)));
+      }
+    }
+    return escaped.toString();
+  }
 
-  private DruidExpression(@Nullable final SimpleExtraction simpleExtraction, final String expression, @Nullable final TrinaryFn<String, ColumnType, ExprMacroTable, VirtualColumn> virtualColumnFn)

Review comment:
       The diff is making it hard to really see this, but I *think* you got rid of this 3-argument constructor.  That's going to make anything built on the old code stop working.  Is it possible to have the 3-argument constructor continue to exist, but be deprecated and delegate to the new constructor kinda like you did with the other methods?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/VirtualColumnRegistry.java
##########
@@ -41,29 +44,33 @@
  */
 public class VirtualColumnRegistry
 {
+  private final ExprMacroTable macroTable;
   private final RowSignature baseRowSignature;
-  private final Map<ExpressionWrapper, VirtualColumn> virtualColumnsByExpression;
-  private final Map<String, VirtualColumn> virtualColumnsByName;
+  private final Map<ExpressionAndTypeHint, String> virtualColumnsByExpression;
+  private final Map<String, ExpressionAndTypeHint> virtualColumnsByName;
   private final String virtualColumnPrefix;
   private int virtualColumnCounter;
 
   private VirtualColumnRegistry(
       RowSignature baseRowSignature,
+      ExprMacroTable macroTable,
       String virtualColumnPrefix,
-      Map<ExpressionWrapper, VirtualColumn> virtualColumnsByExpression,
-      Map<String, VirtualColumn> virtualColumnsByName
+      Map<ExpressionAndTypeHint, String> virtualColumnsByExpression,
+      Map<String, ExpressionAndTypeHint> virtualColumnsByName
   )
   {
+    this.macroTable = macroTable;
     this.baseRowSignature = baseRowSignature;
     this.virtualColumnPrefix = virtualColumnPrefix;
     this.virtualColumnsByExpression = virtualColumnsByExpression;
     this.virtualColumnsByName = virtualColumnsByName;
   }
 
-  public static VirtualColumnRegistry create(final RowSignature rowSignature)

Review comment:
       This single-argument `create` was not resurrected.  I haven't double checked if that might be a compatibility problem or not, but just wanted to highlight it.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java
##########
@@ -32,127 +32,303 @@
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
 
 import javax.annotation.Nullable;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Objects;
 import java.util.function.Function;
 
 /**
+ * todo(clint): rewrite javadocs

Review comment:
       nit: you've got a todo todo




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r803059825



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/VirtualColumnRegistry.java
##########
@@ -169,15 +212,75 @@ public RowSignature getFullRowSignature()
                      .collect(Collectors.toList());
   }
 
-  private static class ExpressionWrapper
+  /**
+   * @deprecated use {@link #getOrCreateVirtualColumnForExpression(DruidExpression, ColumnType)} instead
+   */
+  @Deprecated
+  public VirtualColumn getOrCreateVirtualColumnForExpression(
+      PlannerContext plannerContext,
+      DruidExpression expression,
+      ColumnType valueType
+  )
+  {
+    final String name = getOrCreateVirtualColumnForExpression(expression, valueType);
+    return virtualColumnsByName.get(name).expression.toVirtualColumn(name, valueType, macroTable);
+  }
+
+  /**
+   * @deprecated use {@link #getOrCreateVirtualColumnForExpression(DruidExpression, RelDataType)} instead
+   */
+  @Deprecated
+  public VirtualColumn getOrCreateVirtualColumnForExpression(
+      PlannerContext plannerContext,
+      DruidExpression expression,
+      RelDataType dataType
+  )
   {
-    private final String expression;
-    private final ColumnType valueType;
+    return getOrCreateVirtualColumnForExpression(
+        plannerContext,
+        expression,
+        Calcites.getColumnTypeForRelDataType(dataType)
+    );
+  }
 
-    public ExpressionWrapper(String expression, ColumnType valueType)
+  /**
+   * @deprecated use {@link #getVirtualColumnByExpression(DruidExpression, RelDataType)} instead
+   */
+  @Deprecated
+  @Nullable
+  public VirtualColumn getVirtualColumnByExpression(String expression, RelDataType type)
+  {
+    final ColumnType columnType = Calcites.getColumnTypeForRelDataType(type);
+    ExpressionAndTypeHint wrapped = wrap(DruidExpression.fromExpression(expression), columnType);
+    return Optional.ofNullable(virtualColumnsByExpression.get(wrapped))
+                   .map(this::getVirtualColumn)
+                   .orElse(null);
+  }
+
+  private static ExpressionAndTypeHint wrap(DruidExpression expression, ColumnType typeHint)
+  {
+    return new ExpressionAndTypeHint(expression, typeHint);
+  }
+
+  private static class ExpressionAndTypeHint
+  {
+    private final DruidExpression expression;
+    private final ColumnType typeHint;

Review comment:
       nit: `typeHint` seems worth to document. Can you please add some javadoc for it? It's not a blocker and you can do it as a follow-up.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java
##########
@@ -32,127 +32,303 @@
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
 
 import javax.annotation.Nullable;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Objects;
 import java.util.function.Function;
 
 /**
+ * todo(clint): rewrite javadocs
+ *
  * Represents two kinds of expression-like concepts that native Druid queries support:
  *
  * (1) SimpleExtractions, which are direct column access, possibly with an extractionFn
  * (2) native Druid expressions.
  */
 public class DruidExpression
 {
+  public enum NodeType
+  {
+    /**
+     * constant value
+     */
+    LITERAL,
+    /**
+     * Identifier for a direct physical or virtual column access (column name or virtual column name)
+     */
+    IDENTIFIER,
+    /**
+     * Standard native druid expression, which can compute a string that can be parsed into {@link Expr}, or used
+     * as an {@link ExpressionVirtualColumn}
+     */
+    EXPRESSION,
+    /**
+     * Expression backed by a specialized {@link VirtualColumn}, which might provide more optimized evaluation than
+     * is possible with the standard
+     */
+    SPECIALIZED
+  }
+
   // Must be sorted
   private static final char[] SAFE_CHARS = " ,._-;:(){}[]<>!@#$%^&*`~?/".toCharArray();
+  private static final VirtualColumnBuilder DEFAULT_VIRTUAL_COLUMN_BUILDER = new ExpressionVirtualColumnBuilder();

Review comment:
       Seems more like a function than a builder, but we can change the name later if needed.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteMultiValueStringQueryTest.java
##########
@@ -1137,9 +1151,16 @@ public void testMultiValueListFilterComposed() throws Exception
                         .setContext(QUERY_CONTEXT_DEFAULT)
                         .build()
         ),
-        ImmutableList.of(
+        useDefault ? ImmutableList.of(
             new Object[]{0, 4L},
             new Object[]{1, 2L}
+        ) : ImmutableList.of(
+            // the fallback expression would actually produce 3 rows, 2 nulls, 2 0's, and 2 1s
+            // instead of 4 nulls and two 1's we get when using the 'native' list filtered virtual column
+            // this is because of slight differences between filter and the native
+            // selector, which treats a 0 length array as null instead of an empty array like is produced by filter

Review comment:
       This seems worth the release notes. Can you add some release note section in the PR description, so that we won't forget later?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r802303270



##########
File path: extensions-core/datasketches/src/test/java/org/apache/druid/query/aggregation/datasketches/quantiles/sql/DoublesSketchSqlAggregatorTest.java
##########
@@ -564,7 +564,7 @@ public void testDoublesSketchPostAggs() throws Exception
                       ),
                       new ExpressionPostAggregator(
                           "p3",
-                          "(p2 + 1000)",
+                          "(\"p2\" + 1000)",

Review comment:
       no, I think it was potentially a bug that the identifiers were not always being escaped in expression post-aggs, that has been fixed by the changes in how we build the native string expressions. humans are still allowed to reference identifiers without the double quotes, just the expressions we generate from SQL will always be escaped with them




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a change in pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
imply-cheddar commented on a change in pull request #12241:
URL: https://github.com/apache/druid/pull/12241#discussion_r802315644



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java
##########
@@ -32,127 +32,303 @@
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
 
 import javax.annotation.Nullable;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Objects;
 import java.util.function.Function;
 
 /**
+ * todo(clint): rewrite javadocs
+ *
  * Represents two kinds of expression-like concepts that native Druid queries support:
  *
  * (1) SimpleExtractions, which are direct column access, possibly with an extractionFn
  * (2) native Druid expressions.
  */
 public class DruidExpression
 {
+  public enum NodeType
+  {
+    /**
+     * constant value
+     */
+    LITERAL,
+    /**
+     * Identifier for a direct physical or virtual column access (column name or virtual column name)
+     */
+    IDENTIFIER,
+    /**
+     * Standard native druid expression, which can compute a string that can be parsed into {@link Expr}, or used
+     * as an {@link ExpressionVirtualColumn}
+     */
+    EXPRESSION,
+    /**
+     * Expression backed by a specialized {@link VirtualColumn}, which might provide more optimized evaluation than
+     * is possible with the standard
+     */
+    SPECIALIZED
+  }
+
   // Must be sorted
   private static final char[] SAFE_CHARS = " ,._-;:(){}[]<>!@#$%^&*`~?/".toCharArray();
+  private static final VirtualColumnBuilder DEFAULT_VIRTUAL_COLUMN_BUILDER = new ExpressionVirtualColumnBuilder();
 
   static {
     Arrays.sort(SAFE_CHARS);
   }
 
-  @Nullable
-  private final SimpleExtraction simpleExtraction;
-  private final String expression;
-  private final TrinaryFn<String, ColumnType, ExprMacroTable, VirtualColumn> virtualColumnFn;
+  private static String escape(final String s)
+  {
+    final StringBuilder escaped = new StringBuilder();
+    for (int i = 0; i < s.length(); i++) {
+      final char c = s.charAt(i);
+      if (Character.isLetterOrDigit(c) || Arrays.binarySearch(SAFE_CHARS, c) >= 0) {
+        escaped.append(c);
+      } else {
+        escaped.append("\\u").append(BaseEncoding.base16().encode(Chars.toByteArray(c)));
+      }
+    }
+    return escaped.toString();
+  }
 
-  private DruidExpression(@Nullable final SimpleExtraction simpleExtraction, final String expression, @Nullable final TrinaryFn<String, ColumnType, ExprMacroTable, VirtualColumn> virtualColumnFn)

Review comment:
       right.. it's private.  of course, uhhhh, the diff made it hard for me to see that :P




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on pull request #12241: rework sql expressions and virtual columns

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #12241:
URL: https://github.com/apache/druid/pull/12241#issuecomment-1033035013


   Nice improvement! The new format should be easier to work with in code. The format is only partly an AST. It still creates named arguments, which adds more structure than is really necessary. Did you consider something that is an actual expression tree?
   
   ```json
   "virtualColumns":[
      {
         "type":"fn",
         "fn":"array_length",
         "outputType":"LONG",
         "args":[
            {
               "type:":"fn",
               "fn:":"filter",
               "args":[
                  {
                     "type":"lambda",
                     "output":"dim3",
                     "input_args":[
                        "x"
                     ],
                     "args":[ {
                           "type":"fn",
                           "fn":"array",
                           "args":[ {
                                 "type":"col-ref",
                                 "name":"b"
                              }, {
                              "type":"arg-ref",
                              "name":"x"
                           } ] } ] ] ] ] }
   } ],
   
   ```
   
   The gist is, rather than have a node for every operation (a large number), have nodes for the expression elements: functions, literals, column references, etc. One ends up with a handful of node types rather than many dozens.
   
   
   Calcite should produce a tree something like this, so the translation from Calcite is pretty simple.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org