You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "somu-imply (via GitHub)" <gi...@apache.org> on 2023/03/10 19:26:29 UTC

[GitHub] [druid] somu-imply opened a new pull request, #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

somu-imply opened a new pull request, #13922:
URL: https://github.com/apache/druid/pull/13922

   …inside the unnest cursors. Added tests for scenarios such as
   
   1. filter on unnested column which involves a left filter rewrite
   2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite
   3. not filters
   4. SQL functions applied on top of unnested column
   5. null present in first row of the column to be unnested
   
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1135028741


##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/CorrelateProjectOnFIlterRightTransposeRule.java:
##########
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.rule;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Correlate;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.Uncollect;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlKind;
+
+/**
+ * Rule that pulls a {@link Filter} from the right-hand side of a {@link Correlate} above the Correlate in presence of an unneeded Project
+ * Allows filters on unnested fields to be added to queries that use {@link org.apache.druid.query.UnnestDataSource}.
+ *
+ * @see CorrelateFilterRTransposeRule similar, but for without a Project atop Filter
+ */
+public class CorrelateProjectOnFIlterRightTransposeRule extends RelOptRule
+{
+  private static final CorrelateProjectOnFIlterRightTransposeRule INSTANCE = new CorrelateProjectOnFIlterRightTransposeRule();
+
+  public CorrelateProjectOnFIlterRightTransposeRule()
+  {
+    super(
+        operand(
+            Correlate.class,
+            operand(RelNode.class, any()),
+            operand(Project.class, operand(Filter.class, operand(Uncollect.class, any())))
+        ));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call)
+  {
+    final Correlate correlate = call.rel(0);
+    final Filter right = call.rel(3);
+
+    // Can't pull up filters that explicitly refer to the correlation variable.
+    return !CorrelateFilterRTransposeRule.usesCorrelationId(correlate.getCorrelationId(), right.getCondition());
+  }
+
+  public static CorrelateProjectOnFIlterRightTransposeRule instance()
+  {
+    return INSTANCE;
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call)
+  {
+    final Correlate correlate = call.rel(0);
+    final RelNode left = call.rel(1);
+    final Project rightP = call.rel(2);
+    final Filter rightF = call.rel(3);
+    final Uncollect uncollect = call.rel(4);

Review Comment:
   ## Unread local variable
   
   Variable 'Uncollect uncollect' is never read.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4393)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13922:
URL: https://github.com/apache/druid/pull/13922#issuecomment-1467398792

   A query like `SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3='b'` as Calcite was adding an additional level of project before the filter causing our filter transpose rules to not fire. The calcite plan was
   ```
   126:LogicalProject(d3=[$17])
     124:LogicalCorrelate(subset=[rel#125:Subset#6.NONE.[]], correlation=[$cor0], joinType=[inner], requiredColumns=[{3}])
       8:LogicalTableScan(subset=[rel#114:Subset#0.NONE.[]], table=[[druid, numfoo]])
       122:LogicalProject(subset=[rel#123:Subset#5.NONE.[]], d3=[CAST('d':VARCHAR):VARCHAR])
         120:LogicalFilter(subset=[rel#121:Subset#4.NONE.[]], condition=[=($0, 'd')])
           118:Uncollect(subset=[rel#119:Subset#3.NONE.[]])
             116:LogicalProject(subset=[rel#117:Subset#2.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
               9:LogicalValues(subset=[rel#115:Subset#1.NONE.[0]], tuples=[[{ 0 }]])
   ```
   Added an additional plan that removes this Project step (which does a CAST). Additional tests introduced for selector filters on STRING and NUMERIC values after unnest


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13922:
URL: https://github.com/apache/druid/pull/13922#issuecomment-1468568463

   @gianm @cheddar this PR already contains Gian's changes where the filter after rewrite is added to both pre and post filters. Lack of that change is causing queries to give incorrect results. While I work on the changes as suggested by ERic can we get that part merged, if not through this PR but a separate PR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13922:
URL: https://github.com/apache/druid/pull/13922#issuecomment-1464304382

   Removed a set of tests that involved the allowSet and we now have the ability to pass a filter on the unnested column. Added extra tests in `CalciteArraysQueryTest`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132886605


##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -330,22 +327,19 @@ public void reset()
   @Nullable
   private void initialize()
   {
-    IdLookup idLookup = dimSelector.idLookup();
-    this.indexIntsForRow = new SingleIndexInts();
-    if (allowSet != null && !allowSet.isEmpty() && idLookup != null) {
-      for (String s : allowSet) {
-        if (idLookup.lookupId(s) >= 0) {
-          allowedBitSet.set(idLookup.lookupId(s));
-        }
-      }
+    index = 0;
+    if (allowFilter != null) {

Review Comment:
   Nit: reverse priority



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -290,9 +283,13 @@ public void advance()
   @Override
   public void advanceUninterruptibly()
   {
-    do {
+    while (true) {
       advanceAndUpdate();
-    } while (matchAndProceed());
+      boolean match = valueMatcher.matches();
+      if (match || baseCursor.isDone()) {
+        return;
+      }
+    }

Review Comment:
   Nit:
   
   ```java
         if (valueMatcher.matches() || baseCursor.isDone()) {
   ```



##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -310,12 +314,15 @@ private void getNextRow()
   private void initialize()
   {
     getNextRow();
-    if (allowSet != null) {
-      if (!allowSet.isEmpty()) {
-        if (!allowSet.contains((String) unnestListForCurrentRow.get(index))) {
-          advance();
-        }
-      }
+    if (allowFilter != null) {

Review Comment:
   Nit: reverse polarity.
   
   ```java
       if (allowFilter = null) {
         this.valueMatcher = BooleanValueMatcher.of(true);
       } else {
         this.valueMatcher = allowFilter.makeMatcher(getColumnSelectorFactory());
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1134471691


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -120,27 +117,27 @@ public Sequence<Cursor> makeCursors(
                   retVal,
                   retVal.getColumnSelectorFactory(),
                   unnestColumn,
-                  outputColumnName,
-                  allowSet
+                  outputColumnName
               );
             } else {
               retVal = new UnnestColumnValueSelectorCursor(
                   retVal,
                   retVal.getColumnSelectorFactory(),
                   unnestColumn,
-                  outputColumnName,
-                  allowSet
+                  outputColumnName
               );
             }
           } else {
             retVal = new UnnestColumnValueSelectorCursor(
                 retVal,
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
-                outputColumnName,
-                allowSet
+                outputColumnName
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns

Review Comment:
   My bad removed the stale comments and updated the docs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132994250


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   Also with the valueMatcher inside the UnnestDimensionCursor I was thinking about using a value matcher to lazily build a bitset and return `getValueCardinality()` correctly 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1134464276


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   I would keep it happening in PostJoinCursor for a couple reasons:
   
   1. It may never be useful to push filters into the unnest cursor, because with `rewriteFilterOnUnnestColumnIfPossible` we are pushing them even further: all the way to the underlying `StorageAdapter`.
   2. Even if it does end up being useful to push filters into the unnest cursor, if you aren't planning to do these optimizations immediately, it's IMO better to keep the code simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis merged PR #13922:
URL: https://github.com/apache/druid/pull/13922


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132994250


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   Also with the valueMatcher inside the UnnestDimensionCursor I was thinking about using a value matcher to lazily build a bitset and return `getValueCardinality()` correctly which currently returns the getValueCardinality of the dimension which is incorrect as allowList is always empty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on PR #13922:
URL: https://github.com/apache/druid/pull/13922#issuecomment-1467236981

   Something I don't understand with this structuring of the code.  When we look at the actions taken in planning and running these queries we get
   
   1) SQL is parsed into parse tree and converted to logical DAG
   2) Logical DAG is optimized such that filters are applied to each side of the UNNEST correlate (i.e. Calcite figures out which filters apply to the unnested column (rhs of the LogicalCorrelate with the Uncollect) and which filters apply to the base query (lhs of the LogicalCorrelate with the Uncollect)
   3) We have rules that push all of the filters that Calcite already figured out for us such that they are above the`LogicalCorrelate`.
   4) We build a native query with the filters all meshed together
   5) The native query then has code that figures out, once again, whether some of the filters can be rewritten to be running against the underlying columns
   
   It seems weird to me that we would explicitly undo the thing that Calcite figured out for us so that we can attempt to re-do it in the native query.
   
   I'd propose that we take a `Filter` object on the `UnnestDatasource`.  The UnnestCursor can pretty easily attempt the re-write and pushdown of that RHS filter and also attach it as a `ValueMatcher` on the read.  This is also seems like a much more natural way to plan the query, no?  
   
   Is there some reason that we have to throw away the work that Calcite already did for us only to redo it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132992902


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   The rationale was that going forward the filter on the right will be available on top of the uncollect and Eric and I were discussing if we should pull it into the UnnestDataSource. I agree that the filter can be on the PostJoinCursor. I was also planning of moving in the virtualColumns into the cursors. If keeping it in PostJoin cursor is simpler and we are doing the same amount of work, I'll be happy moving it back



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132994250


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   Also with the valueMatcher inside the UnnestDimensionCursor I was thinking about using a value matcher to lazily build a bitset and return `getValueCardinality()` correctly which currently returns the getValueCardinality of the dimension which is incorrect in presence of a filter as allowList is always empty and we are removing it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132994250


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   Also with the valueMatcher inside the UnnestDimensionCursor I was thinking about using a value matcher to lazily build a bitset and return `getValueCardinality()` correctly which currently returns the getValueCardinality of the dimension which is incorrect in presence of a filter on the unnested column



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132987409


##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -310,12 +314,15 @@ private void getNextRow()
   private void initialize()

Review Comment:
   The javadoc on this method seems out of date (it refers to `allowList`).



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -293,13 +293,16 @@ void add(@Nullable final Filter filter)
         }
 
         if (requiredColumns.contains(outputColumnName)) {
-          // Try to move filter pre-correlate if possible.
+          // Rewrite filter post-unnest if possible.
           final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
           if (newFilter != null) {
+            // Add the rewritten filter pre-unnest, so we get the benefit of any indexes, and so we avoid unnesting
+            // any rows that do not match this filter at all.
             preFilters.add(newFilter);
-          } else {
-            postFilters.add(filter);
           }
+          // This is needed as a filter on an MV String Dimension returns the entire row matching the filter

Review Comment:
   fwiw, it's not just about MV string columns. When we support doing this for arrays, the same thing applies to arrays. The pre-unnest filter is an `array_contains` and the post-unnest filter is a regular `=`.



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   This isn't right, since post-unnest filters may refer to virtual columns. So anything from `filterPair.rhs` that refers to a virtual column needs to be placed here, or else it won't properly see them.
   
   What is the rationale for moving the filter into the UnnestColumnValueSelectorCursor? It seems like the logic there is doing something similar to what PostJoinCursor does: creating a ValueMatcher that wraps the column selector factory. I wonder what's wrong with letting PostJoinCursor do that, which would keep the code in UnnestColumnValueSelectorCursor simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13922:
URL: https://github.com/apache/druid/pull/13922#issuecomment-1464769856

   @gianm I have rolled back and have made the code simpler. The only change is to remove the allowList. Please review when you have time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1134457962


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -293,13 +293,16 @@ void add(@Nullable final Filter filter)
         }
 
         if (requiredColumns.contains(outputColumnName)) {
-          // Try to move filter pre-correlate if possible.
+          // Rewrite filter post-unnest if possible.
           final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
           if (newFilter != null) {
+            // Add the rewritten filter pre-unnest, so we get the benefit of any indexes, and so we avoid unnesting
+            // any rows that do not match this filter at all.
             preFilters.add(newFilter);
-          } else {
-            postFilters.add(filter);
           }
+          // This is needed as a filter on an MV String Dimension returns the entire row matching the filter

Review Comment:
   Above comment is still relevant.



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -120,27 +117,27 @@ public Sequence<Cursor> makeCursors(
                   retVal,
                   retVal.getColumnSelectorFactory(),
                   unnestColumn,
-                  outputColumnName,
-                  allowSet
+                  outputColumnName
               );
             } else {
               retVal = new UnnestColumnValueSelectorCursor(
                   retVal,
                   retVal.getColumnSelectorFactory(),
                   unnestColumn,
-                  outputColumnName,
-                  allowSet
+                  outputColumnName
               );
             }
           } else {
             retVal = new UnnestColumnValueSelectorCursor(
                 retVal,
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
-                outputColumnName,
-                allowSet
+                outputColumnName
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns

Review Comment:
   Why would a future developer want to do this? (Not a rhetorical question: I really don't know.) Please add some rationale to the comment so people know what you have in mind.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The allowList now is not passed …

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1134828705


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   The implementation that we had talked about would be push the filter down all the way to the `StorageAdapter` without any figuring out of which filters to maybe push down (Calcite has already figured that out for us by separating the filters on the two sides of the LogicalCorrelate).
   
   The intent of the implementation (maybe it wasn't done that way though) was that, for applying the filter on the read, it would just be a ValueMatcher happening at read-time which can reuse other code like the `PostJoinCursor`.
   
   The point of attaching the Filter on the UnnestColumn is to make ordering of things more explicit, with the intention that native queries are supposed to be explicit "you told me to do X, so I do X" things.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org