You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/02/28 04:23:05 UTC

[GitHub] [druid] clintropolis commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

clintropolis commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1119544240


##########
processing/src/test/java/org/apache/druid/segment/UnnestStorageAdapterTest.java:
##########
@@ -249,150 +242,43 @@ public void test_two_levels_of_unnest_adapters()
       unnest 2 rows -> 16 entries also the value cardinality
       unnest of unnest -> 16*8 = 128 rows
        */
-      Assert.assertEquals(count, 128);
-      Assert.assertEquals(dimSelector.getValueCardinality(), 16);
+      Assert.assertEquals(128, count);
+      Assert.assertEquals(16, dimSelector.getValueCardinality());
       return null;
     });
   }
 
   @Test
-  public void test_unnest_adapters_with_allowList()
+  public void test_unnest_adapters_basic_with_in_filter()

Review Comment:
   imo please add more tests here, including tests with filters on other columns if possible, since the unnest cursor wraps a base cursor, just to make sure no funny stuff is happening.



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java:
##########
@@ -3341,4 +3328,467 @@ public void testUnnestWithConstant()
         )
     );
   }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('a','b') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("a", "b"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"a"},
+            new Object[]{"b"},
+            new Object[]{"b"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereFilterIsNotOnFirstValue()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('d','c') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("d", "c"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"c"},
+            new Object[]{"d"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereValuesDoNotExist()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('foo','bar') ",

Review Comment:
   please add a test where there are filters on other columns too (a mix of filter on unnested column and filter on some other column)



##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFilterUnnestRule.java:
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.rule;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Project;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.DruidUnnestDatasourceRel;
+
+public class DruidFilterUnnestRule extends RelOptRule
+{
+
+  private final PlannerContext plannerContext;
+
+  public DruidFilterUnnestRule(PlannerContext plannerContext)
+  {
+    super(
+        operand(
+            Filter.class,
+            operand(DruidUnnestDatasourceRel.class, any())
+        )
+    );
+    this.plannerContext = plannerContext;
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call)
+  {
+    final Filter filter = call.rel(0);
+    final DruidUnnestDatasourceRel unnestDatasourceRel = call.rel(1);
+    DruidUnnestDatasourceRel newRel = unnestDatasourceRel.withFilter(filter);
+    call.transformTo(newRel);
+  }
+
+  // This is for a special case of handling selector filters
+  // on top of UnnestDataSourceRel when Calcite adds an extra
+  // LogicalProject on the LogicalFilter. For e.g. #122 here
+  // SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3='b'
+  // 126:LogicalProject(d3=[$17])
+  //  124:LogicalCorrelate(subset=[rel#125:Subset#6.NONE.[]], correlation=[$cor0], joinType=[inner], requiredColumns=[{3}])
+  //    8:LogicalTableScan(subset=[rel#114:Subset#0.NONE.[]], table=[[druid, numfoo]])
+  //    122:LogicalProject(subset=[rel#123:Subset#5.NONE.[]], d3=[CAST('b':VARCHAR):VARCHAR])
+  //      120:LogicalFilter(subset=[rel#121:Subset#4.NONE.[]], condition=[=($0, 'b')])
+  //        118:Uncollect(subset=[rel#119:Subset#3.NONE.[]])
+  //          116:LogicalProject(subset=[rel#117:Subset#2.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
+  //            9:LogicalValues(subset=[rel#115:Subset#1.NONE.[0]], tuples=[[{ 0 }]])
+
+  // This logical project does a type cast only which Druid already has information about
+  // So we can skip this LogicalProject
+  // Extensive unit tests can be found in {@link CalciteArraysQueryTest}
+
+  static class DruidProjectOnCorrelate extends RelOptRule
+  {
+    public DruidProjectOnCorrelate(PlannerContext plannerContext)

Review Comment:
   nit: since we aren't using plannercontext this could be a static instance like some of the other rules



##########
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java:
##########
@@ -387,44 +384,6 @@ public void test_list_unnest_cursors_user_supplied_list_with_dups()
     Assert.assertEquals(k, 10);
   }
 
-  @Test

Review Comment:
   ideally there should be tests added here too... I would expect tests here to be checking stuff like the cursor spits out the expected filtered number of rows, etc.
   
   I guess the storage adapter test partially covers this, but the more tests the better 



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -229,6 +229,15 @@ public static DruidQuery fromPartialQuery(
               virtualColumnRegistry
           )
       );
+    } else if (partialQuery.getUnnestFilter() != null) {
+      filter = Preconditions.checkNotNull(
+          computeUnnestFilter(
+              partialQuery,
+              plannerContext,
+              sourceRowSignature,
+              virtualColumnRegistry
+          )
+      );

Review Comment:
   can you not have both a where filter an an unnest filter? is an unnest filter basically a where filter that got picked up somewhere else?



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java:
##########
@@ -3341,4 +3328,467 @@ public void testUnnestWithConstant()
         )
     );
   }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('a','b') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("a", "b"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"a"},
+            new Object[]{"b"},
+            new Object[]{"b"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereFilterIsNotOnFirstValue()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('d','c') ",

Review Comment:
   to clarify, is the `d3` in these tests is referring to `unnested.d3` or `druid.numfoo.d3`? it seems maybe a bit confusing to the casual reader to name the unnested thing the same thing as the underlying column name which was trasnformed with `MV_TO_ARRAY` (though maybe nice to intentionally cover this in at least one test with comment clarifying the confusing aliases is actually filtering on the unnested output column and not d3)
   
   Also, should there be tests filtering on d3 where the native query plans to filtering on d3 directly instead of the unnested column, similar to the tests with filters directly on the numeric array elements in other tests? That said, that doesn't really go through the new codepaths here, so is more just a bonus coverage.



##########
processing/src/test/java/org/apache/druid/query/scan/UnnestScanQueryRunnerTest.java:
##########
@@ -464,65 +443,6 @@ public void testUnnestRunnerWithOrdering()
     ScanQueryRunnerTest.verify(ascendingExpectedResults, results);
   }
 
-  @Test

Review Comment:
   this should probably be replaced with scan queries which filter on the unnested output column name



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun
+
+    final Filter forBaseFilter;
+    if (filter == null) {
+      forBaseFilter = filter;
     } else {
-      updatedFilter = filter;
+      // if the filter has the unnested column
+      // do not pass it into the base cursor
+      // if there is a filter as d2 > 1 and unnest-d2 < 10
+      // Calcite would push the filter on d2 into the data source
+      // and only the filter on unnest-d2 < 10 will appear here
+      forBaseFilter = filter.getRequiredColumns().contains(outputColumnName) ? null : filter;

Review Comment:
   The `filter` passed in here is all of the filters that have been pushed to the segment, including filters on the unnest column and other columns
   
   It seems like you could potentially decompose the filter to partially push down anything that is is not filtering on the output column name into the base cursor so that the query can still use indexes and such, since otherwise the implication here is that if any filter is on the unnest column then it means we aren't going to use any indexes since we aren't pushing `filter` down to the base cursor.
   
    `QueryableIndexCursorSequenceBuilder` does something like this with `FilterAnalysis.analyzeFilter` to partition the `Filter` into 'pre' and 'post' filters, the former which have bitmap indexes and go into computing the underying `Offset`, and the latter do the value matcher thing we are doing here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org