You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "somu-imply (via GitHub)" <gi...@apache.org> on 2023/02/14 00:26:45 UTC

[GitHub] [druid] somu-imply opened a new pull request, #13799: Now unnest allows bound, in and selector filters on the unnested column

somu-imply opened a new pull request, #13799:
URL: https://github.com/apache/druid/pull/13799

   Previously even in the native query, Druid did not support filtering on the unnested column. In this PR we have introduced the following:
   
   1. Added an allowFilter to use value matchers and advance cursors accordingly
   2. Removed the allowList as we can support the same with filters
   
   TBD: Pushing the filter on the unnested column as a filter on the dimension to be unnested on the left data source
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13799:
URL: https://github.com/apache/druid/pull/13799#issuecomment-1441028490

   @317brian @techdocsmith the docs are behaving flaky with
   ```
   > spellcheck
   > mdspell --en-us --ignore-numbers --report '../docs/**/*.md' || (./script/notify-spellcheck-issues && false)
   
       ../docs/tutorials/tutorial-jupyter-index.md
          41 | Jupyter both try to use port `8888`, so start Jupyter on a different por 
   
   >> 1 spelling error found in 227 files
   ```
   
   The document has `port` instead of `por` so this should not be thrown. I tried running spellcheck locally and things pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply closed pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply closed pull request #13799: Now unnest allows bound, in and selector filters on the unnested column
URL: https://github.com/apache/druid/pull/13799


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1126953734


##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -79,31 +81,32 @@ public class UnnestDimensionCursor implements Cursor
   private final DimensionSelector dimSelector;
   private final String columnName;
   private final String outputName;
-  private final LinkedHashSet<String> allowSet;
-  private final BitSet allowedBitSet;
   private final ColumnSelectorFactory baseColumnSelectorFactory;
-  private int index;
-  @Nullable private IndexedInts indexedIntsForCurrentRow;
+  @Nullable
+  private final Filter allowFilter;
+  private int indexForRow;
+  @Nullable
+  private IndexedInts indexedIntsForCurrentRow;
   private boolean needInitialization;
   private SingleIndexInts indexIntsForRow;
+  private ValueMatcher valueMatcher;
 
   public UnnestDimensionCursor(
       Cursor cursor,
       ColumnSelectorFactory baseColumnSelectorFactory,
       String columnName,
       String outputColumnName,
-      LinkedHashSet<String> allowSet
+      @Nullable Filter allowFilter
   )
   {
     this.baseCursor = cursor;
     this.baseColumnSelectorFactory = baseColumnSelectorFactory;
     this.dimSelector = this.baseColumnSelectorFactory.makeDimensionSelector(DefaultDimensionSpec.of(columnName));
     this.columnName = columnName;
-    this.index = 0;
+    this.indexForRow = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
-    this.allowedBitSet = new BitSet();
+    this.allowFilter = allowFilter;

Review Comment:
   Yes I am working on the logic to add the bitset back. This will be a 1-time operation where I iterate over 0 to baseDimensionSelector.getValueCardinality() and look over each element to check if they match the filter or not and add to bitset if they match



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1124984353


##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/PartialDruidQuery.java:
##########
@@ -359,6 +360,23 @@
     );
   }
 
+  public PartialDruidQuery withUnnestFilter(final Filter newUnnestFilter)

Review Comment:
   ## Useless parameter
   
   The parameter 'newUnnestFilter' is never used.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4377)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1126953734


##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -79,31 +81,32 @@ public class UnnestDimensionCursor implements Cursor
   private final DimensionSelector dimSelector;
   private final String columnName;
   private final String outputName;
-  private final LinkedHashSet<String> allowSet;
-  private final BitSet allowedBitSet;
   private final ColumnSelectorFactory baseColumnSelectorFactory;
-  private int index;
-  @Nullable private IndexedInts indexedIntsForCurrentRow;
+  @Nullable
+  private final Filter allowFilter;
+  private int indexForRow;
+  @Nullable
+  private IndexedInts indexedIntsForCurrentRow;
   private boolean needInitialization;
   private SingleIndexInts indexIntsForRow;
+  private ValueMatcher valueMatcher;
 
   public UnnestDimensionCursor(
       Cursor cursor,
       ColumnSelectorFactory baseColumnSelectorFactory,
       String columnName,
       String outputColumnName,
-      LinkedHashSet<String> allowSet
+      @Nullable Filter allowFilter
   )
   {
     this.baseCursor = cursor;
     this.baseColumnSelectorFactory = baseColumnSelectorFactory;
     this.dimSelector = this.baseColumnSelectorFactory.makeDimensionSelector(DefaultDimensionSpec.of(columnName));
     this.columnName = columnName;
-    this.index = 0;
+    this.indexForRow = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
-    this.allowedBitSet = new BitSet();
+    this.allowFilter = allowFilter;

Review Comment:
   Yes I am workin on the logic to add the bitset back. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1130285655


##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidCorrelateUnnestRule.java:
##########
@@ -101,8 +101,9 @@
     final RexBuilder rexBuilder = correlate.getCluster().getRexBuilder();
 
     final Filter druidRelFilter;
-    final DruidRel<?> newDruidRelFilter;
+    final DruidRel<?> newDruidRel;
     final List<RexNode> newProjectExprs = new ArrayList<>();
+    final List<RexNode> newWheres = new ArrayList<>();

Review Comment:
   ## Unread local variable
   
   Variable 'List<RexNode> newWheres' is never read.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4386)



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -322,20 +349,210 @@
   @Nullable
   private void initialize()
   {
-    IdLookup idLookup = dimSelector.idLookup();
-    this.indexIntsForRow = new SingleIndexInts();
-    if (allowSet != null && !allowSet.isEmpty() && idLookup != null) {
-      for (String s : allowSet) {
-        if (idLookup.lookupId(s) >= 0) {
-          allowedBitSet.set(idLookup.lookupId(s));
+    /*
+    for i=0 to baseColFactory.makeDimSelector.getValueCardinality()
+        match each item with the filter and populate bitset if there's a match
+     */
+
+    if (allowFilter != null) {
+      AtomicInteger idRef = new AtomicInteger();
+      ValueMatcher myMatcher = allowFilter.makeMatcher(new ColumnSelectorFactory()
+      {
+        @Override
+        public DimensionSelector makeDimensionSelector(DimensionSpec dimensionSpec)
+        {
+          if (!outputName.equals(dimensionSpec.getDimension())) {
+            throw new ISE("Asked for bad dimension[%s]", dimensionSpec);
+          }
+          return new DimensionSelector()
+          {
+            private final IndexedInts myInts = new IndexedInts()
+            {
+              @Override
+              public int size()
+              {
+                return 1;
+              }
+
+              @Override
+              public int get(int index)
+              {
+                return 1;
+              }
+
+              @Override
+              public void inspectRuntimeShape(RuntimeShapeInspector inspector)
+              {
+
+              }
+            };
+
+            @Override
+            public IndexedInts getRow()
+            {
+              return myInts;
+            }
+
+            @Override
+            public ValueMatcher makeValueMatcher(@Nullable String value)
+            {
+              // Handle value is null
+              return new ValueMatcher()
+              {
+                @Override
+                public boolean matches()
+                {
+                  return value.equals(lookupName(1));
+                }
+
+                @Override
+                public void inspectRuntimeShape(RuntimeShapeInspector inspector)
+                {
+
+                }
+              };
+            }
+
+            @Override
+            public ValueMatcher makeValueMatcher(Predicate<String> predicate)
+            {
+              return new ValueMatcher()
+              {
+                @Override
+                public boolean matches()
+                {
+                  return predicate.apply(lookupName(1));
+                }
+
+                @Override
+                public void inspectRuntimeShape(RuntimeShapeInspector inspector)
+                {
+
+                }
+              };
+            }
+
+            @Override
+            public void inspectRuntimeShape(RuntimeShapeInspector inspector)
+            {
+
+            }
+
+            @Nullable
+            @Override
+            public Object getObject()
+            {
+              return null;
+            }
+
+            @Override
+            public Class<?> classOfObject()
+            {
+              return null;
+            }
+
+            @Override
+            public int getValueCardinality()
+            {
+              return dimSelector.getValueCardinality();
+            }
+
+            @Nullable
+            @Override
+            public String lookupName(int id)
+            {
+              return dimSelector.lookupName(idRef.get());
+            }
+
+            @Override
+            public boolean nameLookupPossibleInAdvance()
+            {
+              return dimSelector.nameLookupPossibleInAdvance();
+            }
+
+            @Nullable
+            @Override
+            public IdLookup idLookup()
+            {
+              return dimSelector.idLookup();
+            }
+          };
+        }
+
+        @Override
+        public ColumnValueSelector makeColumnValueSelector(String columnName)
+        {
+          return new ColumnValueSelector()
+          {
+            @Override
+            public double getDouble()
+            {
+              return Double.parseDouble(dimSelector.lookupName(idRef.get()));

Review Comment:
   ## Missing catch of NumberFormatException
   
   Potential uncaught 'java.lang.NumberFormatException'.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4387)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13799:
URL: https://github.com/apache/druid/pull/13799#issuecomment-1440657778

   The filter on the unnested dimension or dimensions(in case of an expression using multiple columns) also needs to be pushed down to the datasource to limit the number of rows the cursor goes over. This will be done in a followup PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1126754201


##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -136,13 +140,22 @@ public DruidQuery toDruidQuery(boolean finalizeAggregations)
     final DruidRel<?> druidQueryRel = (DruidRel<?>) left;
     final DruidQuery leftQuery = Preconditions.checkNotNull((druidQueryRel).toDruidQuery(false), "leftQuery");
     final DataSource leftDataSource;
+    final PartialDruidQuery partialQueryFromLeft = druidQueryRel.getPartialDruidQuery();
+    final PartialDruidQuery corrPartialQuey;
+
+    // If there is a LIMIT in the left query
+    // It should be honored before unnest
+    // Create a query data source in that case

Review Comment:
   Yes we have a test case, the query is like
   `SELECT d3 FROM (select * from druid.numfoo where dim2 IN ('a','b','ab','abc') LIMIT 2), UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3)`
   
   Here the limit from the left cannot be applied outside



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1125926722


##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -86,7 +88,11 @@ public UnnestColumnValueSelectorCursor(
     this.index = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
+    if (filter != null) {
+      this.valueMatcher = filter.makeMatcher(getColumnSelectorFactory());

Review Comment:
   Given that this is called in the constructor and then also returned as a public method, maybe switch it around to have the constructor create the `ColumnSelectorFactory`, store the reference and then use that here/return it from the `getColumnSelectorFactory` method.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -79,31 +81,32 @@ public class UnnestDimensionCursor implements Cursor
   private final DimensionSelector dimSelector;
   private final String columnName;
   private final String outputName;
-  private final LinkedHashSet<String> allowSet;
-  private final BitSet allowedBitSet;
   private final ColumnSelectorFactory baseColumnSelectorFactory;
-  private int index;
-  @Nullable private IndexedInts indexedIntsForCurrentRow;
+  @Nullable
+  private final Filter allowFilter;
+  private int indexForRow;
+  @Nullable
+  private IndexedInts indexedIntsForCurrentRow;
   private boolean needInitialization;
   private SingleIndexInts indexIntsForRow;
+  private ValueMatcher valueMatcher;
 
   public UnnestDimensionCursor(
       Cursor cursor,
       ColumnSelectorFactory baseColumnSelectorFactory,
       String columnName,
       String outputColumnName,
-      LinkedHashSet<String> allowSet
+      @Nullable Filter allowFilter
   )
   {
     this.baseCursor = cursor;
     this.baseColumnSelectorFactory = baseColumnSelectorFactory;
     this.dimSelector = this.baseColumnSelectorFactory.makeDimensionSelector(DefaultDimensionSpec.of(columnName));
     this.columnName = columnName;
-    this.index = 0;
+    this.indexForRow = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
-    this.allowedBitSet = new BitSet();
+    this.allowFilter = allowFilter;

Review Comment:
   Paul's comment is asking you to re-evaluate if you need to store the reference to the Filter.  In the ColumnValue version of the cursor, you do *not* store the reference to the Filter and instead generate a `ValueMatcher` that matches everything.  But this code is keeping the reference, it would appear so that it can do an `instanceof` check.
   
   I think the answer to Paul's comment is that you still need to do the logic to add the BitSet back to this implementation and, once you do that, you won't keep the reference to the `Filter` object and things will clean up a bit.



##########
processing/src/test/java/org/apache/druid/segment/UnnestStorageAdapterTest.java:
##########
@@ -249,150 +242,43 @@ public void test_two_levels_of_unnest_adapters()
       unnest 2 rows -> 16 entries also the value cardinality
       unnest of unnest -> 16*8 = 128 rows
        */
-      Assert.assertEquals(count, 128);
-      Assert.assertEquals(dimSelector.getValueCardinality(), 16);
+      Assert.assertEquals(128, count);
+      Assert.assertEquals(16, dimSelector.getValueCardinality());
       return null;
     });
   }
 
   @Test
-  public void test_unnest_adapters_with_allowList()
+  public void test_unnest_adapters_basic_with_in_filter()

Review Comment:
   If there is a test suite for "normal" cursor stuff, it would be great to be able to plug this into one of those.  Those should all be able to pass without any errors as long as they aren't referencing the actual unnested column.
   
   I haven't looked to see how easy it will be to reuse that test suite, but it's worth it to explore what can be done to be able to reuse it.



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -215,6 +254,26 @@ public DruidQuery toDruidQuery(boolean finalizeAggregations)
     );
   }
 
+  private PartialDruidQuery updateCorrPartialQueryFromLeft(PartialDruidQuery partialQueryFromLeft)
+  {
+    // The DruidCorrelateRule already creates the project and pushes it on the top level
+    // So get select project from partialQuery
+    // The filters are present on the partial query of the left
+    // The group by and having clauses would be on the top level
+    // Same for the sort
+    PartialDruidQuery corrQuery = PartialDruidQuery.create(correlateRel);
+    corrQuery = corrQuery.withWhereFilter(partialQueryFromLeft.getWhereFilter())
+                         .withSelectProject(partialQuery.getSelectProject());
+    if (partialQuery.getAggregate() != null) {
+      corrQuery = corrQuery.withAggregate(partialQuery.getAggregate())
+                           .withHavingFilter(partialQuery.getHavingFilter());
+    }
+    if (partialQuery.getSort() != null || partialQuery.getSortProject() != null) {
+      corrQuery = corrQuery.withSort(partialQuery.getSort());
+    }
+    return corrQuery;
+  }
+

Review Comment:
   This feels like we are throwing away the left-hand side and instead trying to build a query from the unnest part?  Taht seems wrong to me, we should be using the unnest on the RHS to augment the query from the LHS (attach to the datasource).  the LHS is the actual working query, the RHS is just the addition of the unnest portion.



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -136,13 +140,22 @@ public DruidQuery toDruidQuery(boolean finalizeAggregations)
     final DruidRel<?> druidQueryRel = (DruidRel<?>) left;
     final DruidQuery leftQuery = Preconditions.checkNotNull((druidQueryRel).toDruidQuery(false), "leftQuery");
     final DataSource leftDataSource;
+    final PartialDruidQuery partialQueryFromLeft = druidQueryRel.getPartialDruidQuery();
+    final PartialDruidQuery corrPartialQuey;
+
+    // If there is a LIMIT in the left query
+    // It should be honored before unnest
+    // Create a query data source in that case

Review Comment:
   What kind of query is that?  Do we have a test covering it?  I'm not sure I believe this comment is true, but I also am not able to imagine a query that can do this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1115145381


##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -243,9 +245,23 @@ public void advance()
   @Override
   public void advanceUninterruptibly()
   {
-    do {
+    // the index currently points to an element at position i ($e_i)
+    // while $e_i does not match the filter
+    // keep advancing the pointer to the first valid match
+    // calling advanceAndUpdate increments the index position so needs a call to matches() again for new match status
+    boolean match = valueMatcher.matches();
+    if (match) {
       advanceAndUpdate();
-    } while (matchAndProceed());
+      match = valueMatcher.matches();
+    }
+    while (!match) {
+      if (!baseCursor.isDone()) {
+        advanceAndUpdate();
+        match = valueMatcher.matches();
+      } else {
+        return;
+      }
+    }

Review Comment:
   Suggestion:
   
   ```java
     while (true) {
       boolean match = valueMatcher.matches();
       advanceAndUpdate();
       if (match || baseCursor.isDone()) {
           return;
       }
     } 
   ```



##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -311,12 +327,11 @@ private void initialize()
   {
     this.unnestListForCurrentRow = new ArrayList<>();
     getNextRow(needInitialization);
-    if (allowSet != null) {
-      if (!allowSet.isEmpty()) {
-        if (!allowSet.contains((String) unnestListForCurrentRow.get(index))) {
-          advance();
-        }
-      }
+
+    // If the first value the index is pointing to does not match the filter
+    // advance the index to the first value which will match
+    if (!valueMatcher.matches()) {
+      advance();

Review Comment:
   Redundant. `advanceUninterruptibly()` already does the needed checks.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -79,31 +81,32 @@ public class UnnestDimensionCursor implements Cursor
   private final DimensionSelector dimSelector;
   private final String columnName;
   private final String outputName;
-  private final LinkedHashSet<String> allowSet;
-  private final BitSet allowedBitSet;
   private final ColumnSelectorFactory baseColumnSelectorFactory;
-  private int index;
-  @Nullable private IndexedInts indexedIntsForCurrentRow;
+  @Nullable
+  private final Filter allowFilter;
+  private int indexForRow;
+  @Nullable
+  private IndexedInts indexedIntsForCurrentRow;
   private boolean needInitialization;
   private SingleIndexInts indexIntsForRow;
+  private ValueMatcher valueMatcher;
 
   public UnnestDimensionCursor(
       Cursor cursor,
       ColumnSelectorFactory baseColumnSelectorFactory,
       String columnName,
       String outputColumnName,
-      LinkedHashSet<String> allowSet
+      @Nullable Filter allowFilter
   )
   {
     this.baseCursor = cursor;
     this.baseColumnSelectorFactory = baseColumnSelectorFactory;
     this.dimSelector = this.baseColumnSelectorFactory.makeDimensionSelector(DefaultDimensionSpec.of(columnName));
     this.columnName = columnName;
-    this.index = 0;
+    this.indexForRow = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
-    this.allowedBitSet = new BitSet();
+    this.allowFilter = allowFilter;

Review Comment:
   Here we allow a null filter. In the code above, we provided an "accept all" filter in this case. Should we do so here to avoid if null checks?



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -308,7 +321,7 @@ public boolean isDoneOrInterrupted()
   @Override
   public void reset()
   {
-    index = 0;
+    indexForRow = 0;

Review Comment:
   Nit: is this the index of a row? If so, then maybe `rowIndex` would be clearer.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -322,22 +335,19 @@ public void reset()
   @Nullable
   private void initialize()
   {
-    IdLookup idLookup = dimSelector.idLookup();
-    this.indexIntsForRow = new SingleIndexInts();
-    if (allowSet != null && !allowSet.isEmpty() && idLookup != null) {
-      for (String s : allowSet) {
-        if (idLookup.lookupId(s) >= 0) {
-          allowedBitSet.set(idLookup.lookupId(s));
-        }
-      }
+    indexForRow = 0;
+    if (allowFilter != null) {
+      this.valueMatcher = allowFilter.makeMatcher(this.getColumnSelectorFactory());
+    } else {
+      this.valueMatcher = BooleanValueMatcher.of(true);
     }
+    this.indexIntsForRow = new SingleIndexInts();
+
     if (dimSelector.getObject() != null) {
       this.indexedIntsForCurrentRow = dimSelector.getRow();
     }
-    if (!allowedBitSet.isEmpty()) {
-      if (!allowedBitSet.get(indexedIntsForCurrentRow.get(index))) {
-        advance();
-      }
+    if (!valueMatcher.matches() && !baseCursor.isDone()) {
+      advance();

Review Comment:
   This is odd. Is the goal of `initialize()` to move to the first row? We treat first rows differently than all other rows? I can't just do `while(next())` but have to special-case the first row?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun

Review Comment:
   Should the "currently" and "TBD" issues be addressed before merging?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -107,15 +105,15 @@ public Sequence<Cursor> makeCursors(
                   retVal.getColumnSelectorFactory(),
                   dimensionToUnnest,
                   outputColumnName,
-                  allowSet
+                  filter

Review Comment:
   Above, we _may_ have passed the filter into the "base". If we did, do we want to reapply the same filter here? Or, should the above code actually be:
   
   ```java
       final Filter forBaseFilter;
       final Filter localFilter;
       if (filter == null || filter.getRequiredColumns().contains(outputColumnName) ) {
         forBaseFilter = filter;
         localFilter = null;
       } else {
         forBaseFilter = filter;
         localFilter = null;
      }
   ```
   
   Then pass `localFilter` where we're passing `filter`?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun
+
+    final Filter forBaseFilter;
+    if (filter == null) {
+      forBaseFilter = filter;
     } else {
-      updatedFilter = filter;
+      // if the filter has the unnested column
+      // do not pass it into the base cursor
+      // if there is a filter as d2 > 1 and unnest-d2 < 10
+      // Calcite would push the filter on d2 into the data source
+      // and only the filter on unnest-d2 < 10 will appear here
+      forBaseFilter = filter.getRequiredColumns().contains(outputColumnName) ? null : filter;

Review Comment:
   This code seems to assume a single filter. In the example, `unnest-d2 < 10`. But, what if my condition is `unnest-d2 < 10 AND unnest_d2 > 5`. Will this code work? And if only the `unnest-d2` appears here, do we need to check the output column name?
   
   To help understand this, it would be good to give a bit more of the query. What, exactly, is `d2` here?



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -203,8 +202,10 @@ public Class<?> classOfObject()
           @Override
           public int getValueCardinality()
           {
-            if (!allowedBitSet.isEmpty()) {
-              return allowedBitSet.cardinality();
+            if (allowFilter instanceof InDimFilter) {
+              return ((InDimFilter) allowFilter).getValues().size();
+            } else if (allowFilter instanceof AndFilter) {
+              return ((AndFilter) allowFilter).getFilters().size();

Review Comment:
   Is this right? For an `in` filter, we know that the value has to be one of those in the list. But, for `and`, I can have anything can't I? `x > 0 AND x < 1000` would have a cardinality of 1000 for a `long` type.



##########
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java:
##########
@@ -54,7 +51,7 @@ public void test_list_unnest_cursors()
         listCursor.getColumnSelectorFactory(),
         "dummy",
         OUTPUT_NAME,
-        IGNORE_SET
+        null

Review Comment:
   Do these tests a wide range of predicates (filters) on the unnested value? I worry that the code above handles only simple cases: `unnested_value op constant`. Do they also handle complex cases such as `fn(unnested_value) > some_other_col`? `fn(unnested_value) > 10 AND unnested_value * 3 > 12`? Would be good to have tests that exercise these non-trivial cases.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -351,43 +361,25 @@ private void initialize()
   private void advanceAndUpdate()
   {
     if (indexedIntsForCurrentRow == null) {
-      index = 0;
+      indexForRow = 0;
       if (!baseCursor.isDone()) {
         baseCursor.advanceUninterruptibly();
       }
     } else {
-      if (index >= indexedIntsForCurrentRow.size() - 1) {
+      if (indexForRow >= indexedIntsForCurrentRow.size() - 1) {

Review Comment:
   As a reader, `indexedIntsForCurrentRow` doesn't seem to tell me anything. What is an "indexed int"? That is, what does it do? Are these the rows that have values for the current row? Something else? Can we come up with a name that might help us newbies?



##########
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java:
##########
@@ -387,44 +384,6 @@ public void test_list_unnest_cursors_user_supplied_list_with_dups()
     Assert.assertEquals(k, 10);
   }
 
-  @Test

Review Comment:
   Should these tests be retained, but updated to whatever the new format is? Presumably these addressed some earlier concern, and we should ensure that they still work. Or, are there new tests that cover the same cases that the deleted tests cover?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun
+
+    final Filter forBaseFilter;
+    if (filter == null) {
+      forBaseFilter = filter;
     } else {
-      updatedFilter = filter;
+      // if the filter has the unnested column
+      // do not pass it into the base cursor
+      // if there is a filter as d2 > 1 and unnest-d2 < 10
+      // Calcite would push the filter on d2 into the data source
+      // and only the filter on unnest-d2 < 10 will appear here
+      forBaseFilter = filter.getRequiredColumns().contains(outputColumnName) ? null : filter;

Review Comment:
   Nit:
   
   ```java
   final Filter forBaseFilter = filter == null || filter.getRequiredColumns().contains(outputColumnName) ? null : filter;
   ```



##########
processing/src/test/java/org/apache/druid/query/groupby/UnnestGroupByQueryRunnerTest.java:
##########
@@ -699,8 +696,7 @@ public void testGroupByOnUnnestedVirtualMultiColumn()
     final DataSource unnestDataSource = UnnestDataSource.create(
         new TableDataSource(QueryRunnerTestHelper.DATA_SOURCE),
         "vc",
-        QueryRunnerTestHelper.PLACEMENTISH_DIMENSION_UNNEST,
-        null
+        QueryRunnerTestHelper.PLACEMENTISH_DIMENSION_UNNEST

Review Comment:
   Sorry, what is a `PLACEMENTISH`?



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -282,9 +283,21 @@ public void advance()
   @Override
   public void advanceUninterruptibly()
   {
-    do {
-      advanceAndUpdate();
-    } while (matchAndProceed());
+    if (!baseCursor.isDone()) {
+      boolean match = valueMatcher.matches();
+      if (match) {
+        advanceAndUpdate();
+        match = valueMatcher.matches();
+      }
+      while (!match) {
+        if (!baseCursor.isDone()) {
+          advanceAndUpdate();
+          match = valueMatcher.matches();
+        } else {
+          return;
+        }
+      }
+    }

Review Comment:
   See suggestion above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1116360972


##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -79,31 +81,32 @@ public class UnnestDimensionCursor implements Cursor
   private final DimensionSelector dimSelector;
   private final String columnName;
   private final String outputName;
-  private final LinkedHashSet<String> allowSet;
-  private final BitSet allowedBitSet;
   private final ColumnSelectorFactory baseColumnSelectorFactory;
-  private int index;
-  @Nullable private IndexedInts indexedIntsForCurrentRow;
+  @Nullable
+  private final Filter allowFilter;
+  private int indexForRow;
+  @Nullable
+  private IndexedInts indexedIntsForCurrentRow;
   private boolean needInitialization;
   private SingleIndexInts indexIntsForRow;
+  private ValueMatcher valueMatcher;
 
   public UnnestDimensionCursor(
       Cursor cursor,
       ColumnSelectorFactory baseColumnSelectorFactory,
       String columnName,
       String outputColumnName,
-      LinkedHashSet<String> allowSet
+      @Nullable Filter allowFilter
   )
   {
     this.baseCursor = cursor;
     this.baseColumnSelectorFactory = baseColumnSelectorFactory;
     this.dimSelector = this.baseColumnSelectorFactory.makeDimensionSelector(DefaultDimensionSpec.of(columnName));
     this.columnName = columnName;
-    this.index = 0;
+    this.indexForRow = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
-    this.allowedBitSet = new BitSet();
+    this.allowFilter = allowFilter;

Review Comment:
   I did not understand this comment, can you please elaborate. On another note I'll remove the bitset it is not needed anymore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1119544240


##########
processing/src/test/java/org/apache/druid/segment/UnnestStorageAdapterTest.java:
##########
@@ -249,150 +242,43 @@ public void test_two_levels_of_unnest_adapters()
       unnest 2 rows -> 16 entries also the value cardinality
       unnest of unnest -> 16*8 = 128 rows
        */
-      Assert.assertEquals(count, 128);
-      Assert.assertEquals(dimSelector.getValueCardinality(), 16);
+      Assert.assertEquals(128, count);
+      Assert.assertEquals(16, dimSelector.getValueCardinality());
       return null;
     });
   }
 
   @Test
-  public void test_unnest_adapters_with_allowList()
+  public void test_unnest_adapters_basic_with_in_filter()

Review Comment:
   imo please add more tests here, including tests with filters on other columns if possible, since the unnest cursor wraps a base cursor, just to make sure no funny stuff is happening.



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java:
##########
@@ -3341,4 +3328,467 @@ public void testUnnestWithConstant()
         )
     );
   }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('a','b') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("a", "b"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"a"},
+            new Object[]{"b"},
+            new Object[]{"b"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereFilterIsNotOnFirstValue()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('d','c') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("d", "c"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"c"},
+            new Object[]{"d"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereValuesDoNotExist()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('foo','bar') ",

Review Comment:
   please add a test where there are filters on other columns too (a mix of filter on unnested column and filter on some other column)



##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFilterUnnestRule.java:
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.rule;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Project;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.DruidUnnestDatasourceRel;
+
+public class DruidFilterUnnestRule extends RelOptRule
+{
+
+  private final PlannerContext plannerContext;
+
+  public DruidFilterUnnestRule(PlannerContext plannerContext)
+  {
+    super(
+        operand(
+            Filter.class,
+            operand(DruidUnnestDatasourceRel.class, any())
+        )
+    );
+    this.plannerContext = plannerContext;
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call)
+  {
+    final Filter filter = call.rel(0);
+    final DruidUnnestDatasourceRel unnestDatasourceRel = call.rel(1);
+    DruidUnnestDatasourceRel newRel = unnestDatasourceRel.withFilter(filter);
+    call.transformTo(newRel);
+  }
+
+  // This is for a special case of handling selector filters
+  // on top of UnnestDataSourceRel when Calcite adds an extra
+  // LogicalProject on the LogicalFilter. For e.g. #122 here
+  // SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3='b'
+  // 126:LogicalProject(d3=[$17])
+  //  124:LogicalCorrelate(subset=[rel#125:Subset#6.NONE.[]], correlation=[$cor0], joinType=[inner], requiredColumns=[{3}])
+  //    8:LogicalTableScan(subset=[rel#114:Subset#0.NONE.[]], table=[[druid, numfoo]])
+  //    122:LogicalProject(subset=[rel#123:Subset#5.NONE.[]], d3=[CAST('b':VARCHAR):VARCHAR])
+  //      120:LogicalFilter(subset=[rel#121:Subset#4.NONE.[]], condition=[=($0, 'b')])
+  //        118:Uncollect(subset=[rel#119:Subset#3.NONE.[]])
+  //          116:LogicalProject(subset=[rel#117:Subset#2.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
+  //            9:LogicalValues(subset=[rel#115:Subset#1.NONE.[0]], tuples=[[{ 0 }]])
+
+  // This logical project does a type cast only which Druid already has information about
+  // So we can skip this LogicalProject
+  // Extensive unit tests can be found in {@link CalciteArraysQueryTest}
+
+  static class DruidProjectOnCorrelate extends RelOptRule
+  {
+    public DruidProjectOnCorrelate(PlannerContext plannerContext)

Review Comment:
   nit: since we aren't using plannercontext this could be a static instance like some of the other rules



##########
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java:
##########
@@ -387,44 +384,6 @@ public void test_list_unnest_cursors_user_supplied_list_with_dups()
     Assert.assertEquals(k, 10);
   }
 
-  @Test

Review Comment:
   ideally there should be tests added here too... I would expect tests here to be checking stuff like the cursor spits out the expected filtered number of rows, etc.
   
   I guess the storage adapter test partially covers this, but the more tests the better 



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -229,6 +229,15 @@ public static DruidQuery fromPartialQuery(
               virtualColumnRegistry
           )
       );
+    } else if (partialQuery.getUnnestFilter() != null) {
+      filter = Preconditions.checkNotNull(
+          computeUnnestFilter(
+              partialQuery,
+              plannerContext,
+              sourceRowSignature,
+              virtualColumnRegistry
+          )
+      );

Review Comment:
   can you not have both a where filter an an unnest filter? is an unnest filter basically a where filter that got picked up somewhere else?



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java:
##########
@@ -3341,4 +3328,467 @@ public void testUnnestWithConstant()
         )
     );
   }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('a','b') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("a", "b"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"a"},
+            new Object[]{"b"},
+            new Object[]{"b"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereFilterIsNotOnFirstValue()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('d','c') ",

Review Comment:
   to clarify, is the `d3` in these tests is referring to `unnested.d3` or `druid.numfoo.d3`? it seems maybe a bit confusing to the casual reader to name the unnested thing the same thing as the underlying column name which was trasnformed with `MV_TO_ARRAY` (though maybe nice to intentionally cover this in at least one test with comment clarifying the confusing aliases is actually filtering on the unnested output column and not d3)
   
   Also, should there be tests filtering on d3 where the native query plans to filtering on d3 directly instead of the unnested column, similar to the tests with filters directly on the numeric array elements in other tests? That said, that doesn't really go through the new codepaths here, so is more just a bonus coverage.



##########
processing/src/test/java/org/apache/druid/query/scan/UnnestScanQueryRunnerTest.java:
##########
@@ -464,65 +443,6 @@ public void testUnnestRunnerWithOrdering()
     ScanQueryRunnerTest.verify(ascendingExpectedResults, results);
   }
 
-  @Test

Review Comment:
   this should probably be replaced with scan queries which filter on the unnested output column name



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun
+
+    final Filter forBaseFilter;
+    if (filter == null) {
+      forBaseFilter = filter;
     } else {
-      updatedFilter = filter;
+      // if the filter has the unnested column
+      // do not pass it into the base cursor
+      // if there is a filter as d2 > 1 and unnest-d2 < 10
+      // Calcite would push the filter on d2 into the data source
+      // and only the filter on unnest-d2 < 10 will appear here
+      forBaseFilter = filter.getRequiredColumns().contains(outputColumnName) ? null : filter;

Review Comment:
   The `filter` passed in here is all of the filters that have been pushed to the segment, including filters on the unnest column and other columns
   
   It seems like you could potentially decompose the filter to partially push down anything that is is not filtering on the output column name into the base cursor so that the query can still use indexes and such, since otherwise the implication here is that if any filter is on the unnest column then it means we aren't going to use any indexes since we aren't pushing `filter` down to the base cursor.
   
    `QueryableIndexCursorSequenceBuilder` does something like this with `FilterAnalysis.analyzeFilter` to partition the `Filter` into 'pre' and 'post' filters, the former which have bitmap indexes and go into computing the underying `Offset`, and the latter do the value matcher thing we are doing here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1120892333


##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -229,6 +229,15 @@ public static DruidQuery fromPartialQuery(
               virtualColumnRegistry
           )
       );
+    } else if (partialQuery.getUnnestFilter() != null) {
+      filter = Preconditions.checkNotNull(
+          computeUnnestFilter(
+              partialQuery,
+              plannerContext,
+              sourceRowSignature,
+              virtualColumnRegistry
+          )
+      );

Review Comment:
   Yes an unnest filter is same as a where or having that gets picked up somewhere else



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13799:
URL: https://github.com/apache/druid/pull/13799#issuecomment-1447302254

   The one failing test here will be resolved once https://github.com/apache/druid/pull/13860 gets merged


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1116365135


##########
processing/src/test/java/org/apache/druid/query/groupby/UnnestGroupByQueryRunnerTest.java:
##########
@@ -699,8 +696,7 @@ public void testGroupByOnUnnestedVirtualMultiColumn()
     final DataSource unnestDataSource = UnnestDataSource.create(
         new TableDataSource(QueryRunnerTestHelper.DATA_SOURCE),
         "vc",
-        QueryRunnerTestHelper.PLACEMENTISH_DIMENSION_UNNEST,
-        null
+        QueryRunnerTestHelper.PLACEMENTISH_DIMENSION_UNNEST

Review Comment:
   It is a dimension in our test setup that is of the format `placement,<some_string>`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1116365343


##########
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java:
##########
@@ -387,44 +384,6 @@ public void test_list_unnest_cursors_user_supplied_list_with_dups()
     Assert.assertEquals(k, 10);
   }
 
-  @Test

Review Comment:
   There are new tests that cover these. They are added in the CalciteArraysQueryTest



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1116362455


##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -203,8 +202,10 @@ public Class<?> classOfObject()
           @Override
           public int getValueCardinality()
           {
-            if (!allowedBitSet.isEmpty()) {
-              return allowedBitSet.cardinality();
+            if (allowFilter instanceof InDimFilter) {
+              return ((InDimFilter) allowFilter).getValues().size();
+            } else if (allowFilter instanceof AndFilter) {
+              return ((AndFilter) allowFilter).getFilters().size();

Review Comment:
   Valid point, we can only estimate it correctly for In filters (no in the in filter) and selector filter (should always be 1). This can get complicated for a filter such as d3='a' AND d3='b' where the cardinality will be equal to the size of the filters. I think Calcite would change that to d3 IN (a,b) so we should only handle the 2 cases



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1116364786


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -107,15 +105,15 @@ public Sequence<Cursor> makeCursors(
                   retVal.getColumnSelectorFactory(),
                   dimensionToUnnest,
                   outputColumnName,
-                  allowSet
+                  filter

Review Comment:
   The filter that appears here will be on the unnested column only. For example if you are unnesting dim3 to ud3, any filter on ud3 will appear here. The filter on ud3 cannot be pushed into base but can only be sent to the unnest cursor as an allow filter. There is scope of optimization here to rewrite the same filter on dim3 and push it to the base



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1114888812


##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFilterUnnestRule.java:
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.rule;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Project;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.DruidUnnestDatasourceRel;
+
+public class DruidFilterUnnestRule extends RelOptRule
+{
+
+  private final PlannerContext plannerContext;
+
+  public DruidFilterUnnestRule(PlannerContext plannerContext)
+  {
+    super(
+        operand(
+            Filter.class,
+            operand(DruidUnnestDatasourceRel.class, any())
+        )
+    );
+    this.plannerContext = plannerContext;
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call)
+  {
+    final Filter filter = call.rel(0);
+    final DruidUnnestDatasourceRel unnestDatasourceRel = call.rel(1);
+    DruidUnnestDatasourceRel newRel = unnestDatasourceRel.withFilter(filter);
+    call.transformTo(newRel);
+  }
+
+  // This is for a special case of handling selector filters
+  // on top of UnnestDataSourceRel when Calcite adds an extra
+  // LogicalProject on the LogicalFilter. For e.g. #122 here
+  // SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3='b'
+  // 126:LogicalProject(d3=[$17])
+  //  124:LogicalCorrelate(subset=[rel#125:Subset#6.NONE.[]], correlation=[$cor0], joinType=[inner], requiredColumns=[{3}])
+  //    8:LogicalTableScan(subset=[rel#114:Subset#0.NONE.[]], table=[[druid, numfoo]])
+  //    122:LogicalProject(subset=[rel#123:Subset#5.NONE.[]], d3=[CAST('b':VARCHAR):VARCHAR])
+  //      120:LogicalFilter(subset=[rel#121:Subset#4.NONE.[]], condition=[=($0, 'b')])
+  //        118:Uncollect(subset=[rel#119:Subset#3.NONE.[]])
+  //          116:LogicalProject(subset=[rel#117:Subset#2.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
+  //            9:LogicalValues(subset=[rel#115:Subset#1.NONE.[0]], tuples=[[{ 0 }]])
+
+  // This logical project does a type cast only which Druid already has information about
+  // So we can skip this LogicalProject
+  // Extensive unit tests can be found in {@link CalciteArraysQueryTest}
+
+  static class DruidProjectOnCorrelate extends RelOptRule
+  {
+    public DruidProjectOnCorrelate(PlannerContext plannerContext)

Review Comment:
   ## Useless parameter
   
   The parameter 'plannerContext' is never used.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4356)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1127266826


##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -215,6 +254,26 @@ public DruidQuery toDruidQuery(boolean finalizeAggregations)
     );
   }
 
+  private PartialDruidQuery updateCorrPartialQueryFromLeft(PartialDruidQuery partialQueryFromLeft)
+  {
+    // The DruidCorrelateRule already creates the project and pushes it on the top level
+    // So get select project from partialQuery
+    // The filters are present on the partial query of the left
+    // The group by and having clauses would be on the top level
+    // Same for the sort
+    PartialDruidQuery corrQuery = PartialDruidQuery.create(correlateRel);
+    corrQuery = corrQuery.withWhereFilter(partialQueryFromLeft.getWhereFilter())
+                         .withSelectProject(partialQuery.getSelectProject());
+    if (partialQuery.getAggregate() != null) {
+      corrQuery = corrQuery.withAggregate(partialQuery.getAggregate())
+                           .withHavingFilter(partialQuery.getHavingFilter());
+    }
+    if (partialQuery.getSort() != null || partialQuery.getSortProject() != null) {
+      corrQuery = corrQuery.withSort(partialQuery.getSort());
+    }
+    return corrQuery;
+  }
+

Review Comment:
   The DruidCorrelateUnnestRule already put the select project on top of the correlateRel by adding all the things in right to the left. Also the group by, order by on the query are already present on the correlateRel. So we create the query by adding the select project (the one which is already projected on top with left + right), the filters from the left + aggregates (if present) + sorts if present on the top level



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1120891415


##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java:
##########
@@ -3341,4 +3328,467 @@ public void testUnnestWithConstant()
         )
     );
   }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('a','b') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("a", "b"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"a"},
+            new Object[]{"b"},
+            new Object[]{"b"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereFilterIsNotOnFirstValue()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('d','c') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("d", "c"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"c"},
+            new Object[]{"d"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereValuesDoNotExist()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('foo','bar') ",

Review Comment:
   There is 1 already `SELECT d12 FROM druid.numfoo, UNNEST(ARRAY[m1, m2]) as unnested (d12) where d12 IN ('1','2') AND m1 < 10`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1121015434


##########
processing/src/main/java/org/apache/druid/query/UnnestDataSource.java:
##########
@@ -50,30 +48,26 @@ public class UnnestDataSource implements DataSource
   private final DataSource base;
   private final String column;
   private final String outputName;
-  private final LinkedHashSet<String> allowList;
 
   private UnnestDataSource(
       DataSource dataSource,
       String columnName,
-      String outputName,
-      LinkedHashSet<String> allowList
+      String outputName
   )

Review Comment:
   I had expected you to change the allowList to a Filter but keep it on this object.  It looks like you are assuming that the filter passed into the call to make the cursor is good enough.  In the "best" case, you are planning things to attach the filter on the unnested thing to the other filters being included, in which case, as clint mentions, the entire filter is moving to a value matcher with absolutely nothing being pushed down.  In the worst case, some of the filters are being lost in planning and incorrect results can occur.  I haven't validated which one of these is likely occuring yet.
   
   Given that Calcite very clearly puts the filter on the UNCOLLECT, you should be able to just attach a Filter object here instead of putting it on the query as a whole.  The logic inside of this should remain more or less the same as what it was before with the allow list, just, you are applying a value predicate to the dictionaries instead of building a bitset.
   
   The filter used to make the cursor should continue to be passed down (though it should also have the filter that is attached here adjusted and pushed down as well).



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java:
##########
@@ -3341,4 +3328,467 @@ public void testUnnestWithConstant()
         )
     );
   }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('a','b') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("a", "b"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"a"},
+            new Object[]{"b"},
+            new Object[]{"b"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereFilterIsNotOnFirstValue()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('d','c') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("d", "c"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"c"},
+            new Object[]{"d"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereValuesDoNotExist()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('foo','bar') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("foo", "bar"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+        ImmutableList.of()
+    );
+  }
+
+  @Test
+  public void testUnnestWithBoundFilterOnUnnestedCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where (d3>= 'b' AND d3 < 'd') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(bound("EXPR$0", "b", "d", false, true, null, StringComparators.LEXICOGRAPHIC))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"b"},
+            new Object[]{"b"},
+            new Object[]{"c"}
+        )
+    );
+  }
+
+
+  @Test
+  public void testUnnestWithFilteringOnUnnestedVirtualCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d12 FROM druid.numfoo, UNNEST(ARRAY[m1, m2]) as unnested (d12) where d12 IN ('1','2') AND m1 < 10",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new QueryDataSource(
+                          newScanQueryBuilder()
+                              .dataSource(
+                                  new TableDataSource(CalciteTests.DATASOURCE3)
+                              )
+                              .intervals(querySegmentSpec(Filtration.eternity()))
+                              .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                              .legacy(false)
+                              .filters(bound("m1", null, "10", false, true, null, StringComparators.NUMERIC))
+                              .columns(
+                                  "__time",
+                                  "cnt",
+                                  "d1",
+                                  "d2",
+                                  "dim1",
+                                  "dim2",
+                                  "dim3",
+                                  "dim4",
+                                  "dim5",
+                                  "dim6",
+                                  "f1",
+                                  "f2",
+                                  "l1",
+                                  "l2",
+                                  "m1",
+                                  "m2",
+                                  "unique_dim1"
+                              )
+                              .context(QUERY_CONTEXT_DEFAULT)
+                              .build()
+                      ),
+                      "v0",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .virtualColumns(expressionVirtualColumn("v0", "array(\"m1\",\"m2\")", ColumnType.FLOAT_ARRAY))
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("1.0", "2.0"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{1.0f},
+            new Object[]{1.0f},
+            new Object[]{2.0f},
+            new Object[]{2.0f}
+        )
+    );
+  }

Review Comment:
   So, from looking at how things are planned, I thought that we were throwing away the actual where clause.  Instead, this test showed me what is actually happening: when we have both filters, the unnest is being planned on top of a scan query which carries the original filter.
   
   This is a bad plan, it is going to needlessly force things to run on the Broker instead of being pushed down.  The unnest should be pushed down onto the actual TableDataSource and the native query here should be a singular scan on top of an unnest data source of a table reference.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -203,8 +202,10 @@ public Class<?> classOfObject()
           @Override
           public int getValueCardinality()
           {
-            if (!allowedBitSet.isEmpty()) {
-              return allowedBitSet.cardinality();
+            if (allowFilter instanceof InDimFilter) {
+              return ((InDimFilter) allowFilter).getValues().size();
+            } else if (allowFilter instanceof AndFilter) {
+              return ((AndFilter) allowFilter).getFilters().size();

Review Comment:
   We cannot eliminate the bitset here.  You need to be producing the bitset by applying the value matcher to the dictionary of values.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -351,43 +361,25 @@ private void initialize()
   private void advanceAndUpdate()
   {
     if (indexedIntsForCurrentRow == null) {
-      index = 0;
+      indexForRow = 0;
       if (!baseCursor.isDone()) {
         baseCursor.advanceUninterruptibly();
       }
     } else {
-      if (index >= indexedIntsForCurrentRow.size() - 1) {
+      if (indexForRow >= indexedIntsForCurrentRow.size() - 1) {

Review Comment:
   IndexedInt is a class in the Druid code base.  It's logically equivalent to an `int[]`.  The values are references to the dictionary.  There is some javadoc that attempts to explain these things on interfaces like `DimensionSelector` and `DimensionDictionarySelector`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1119617731


##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java:
##########
@@ -3341,4 +3328,467 @@ public void testUnnestWithConstant()
         )
     );
   }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedCol()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('a','b') ",
+        ImmutableList.of(
+            Druids.newScanQueryBuilder()
+                  .dataSource(UnnestDataSource.create(
+                      new TableDataSource(CalciteTests.DATASOURCE3),
+                      "dim3",
+                      "EXPR$0"
+                  ))
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .resultFormat(ScanQuery.ResultFormat.RESULT_FORMAT_COMPACTED_LIST)
+                  .legacy(false)
+                  .context(QUERY_CONTEXT_DEFAULT)
+                  .filters(new InDimFilter("EXPR$0", ImmutableList.of("a", "b"), null))
+                  .columns(ImmutableList.of(
+                      "EXPR$0"
+                  ))
+                  .build()
+        ),
+
+        ImmutableList.of(
+            new Object[]{"a"},
+            new Object[]{"b"},
+            new Object[]{"b"}
+        )
+    );
+  }
+
+  @Test
+  public void testUnnestWithInFilterOnUnnestedColWhereFilterIsNotOnFirstValue()
+  {
+    skipVectorize();
+    cannotVectorize();
+    testQuery(
+        "SELECT d3 FROM druid.numfoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where d3 IN ('d','c') ",

Review Comment:
   oh, i got confused from all the repetitive patterns in this query, d3 isn't dim3, another instance of me not being able to read :sob:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1105192799


##########
processing/src/test/java/org/apache/druid/segment/UnnestStorageAdapterTest.java:
##########
@@ -249,151 +248,44 @@
       unnest 2 rows -> 16 entries also the value cardinality
       unnest of unnest -> 16*8 = 128 rows
        */
-      Assert.assertEquals(count, 128);
-      Assert.assertEquals(dimSelector.getValueCardinality(), 16);
+      Assert.assertEquals(128, count);
+      Assert.assertEquals(16, dimSelector.getValueCardinality());
       return null;
     });
   }
 
   @Test
-  public void test_unnest_adapters_with_allowList()
+  public void test_unnest_adapters_basic_with_filter()
   {
-    final String columnName = "multi-string1";
-
-    Sequence<Cursor> cursorSequence = UNNEST_STORAGE_ADAPTER1.makeCursors(
-        null,
-        UNNEST_STORAGE_ADAPTER1.getInterval(),
+    Filter f = new InDimFilter(OUTPUT_COLUMN_NAME, ImmutableSet.of("1", "3", "5"));
+    Sequence<Cursor> cursorSequence = UNNEST_STORAGE_ADAPTER.makeCursors(
+        f,
+        UNNEST_STORAGE_ADAPTER.getInterval(),
         VirtualColumns.EMPTY,
         Granularities.ALL,
         false,
         null
     );
 
+    List<String> expectedResults = Arrays.asList("1", "3", "5");
     cursorSequence.accumulate(null, (accumulated, cursor) -> {
       ColumnSelectorFactory factory = cursor.getColumnSelectorFactory();
 
       DimensionSelector dimSelector = factory.makeDimensionSelector(DefaultDimensionSpec.of(OUTPUT_COLUMN_NAME));
-      ColumnValueSelector valueSelector = factory.makeColumnValueSelector(OUTPUT_COLUMN_NAME);
-
       int count = 0;
       while (!cursor.isDone()) {
         Object dimSelectorVal = dimSelector.getObject();
-        Object valueSelectorVal = valueSelector.getObject();
         if (dimSelectorVal == null) {
           Assert.assertNull(dimSelectorVal);
-        } else if (valueSelectorVal == null) {
-          Assert.assertNull(valueSelectorVal);
         }
+        Assert.assertEquals(expectedResults.get(count), dimSelectorVal.toString());

Review Comment:
   ## Dereferenced variable may be null
   
   Variable [dimSelectorVal](1) may be null at this access as suggested by [this](2) null guard.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4265)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1127266826


##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -215,6 +254,26 @@ public DruidQuery toDruidQuery(boolean finalizeAggregations)
     );
   }
 
+  private PartialDruidQuery updateCorrPartialQueryFromLeft(PartialDruidQuery partialQueryFromLeft)
+  {
+    // The DruidCorrelateRule already creates the project and pushes it on the top level
+    // So get select project from partialQuery
+    // The filters are present on the partial query of the left
+    // The group by and having clauses would be on the top level
+    // Same for the sort
+    PartialDruidQuery corrQuery = PartialDruidQuery.create(correlateRel);
+    corrQuery = corrQuery.withWhereFilter(partialQueryFromLeft.getWhereFilter())
+                         .withSelectProject(partialQuery.getSelectProject());
+    if (partialQuery.getAggregate() != null) {
+      corrQuery = corrQuery.withAggregate(partialQuery.getAggregate())
+                           .withHavingFilter(partialQuery.getHavingFilter());
+    }
+    if (partialQuery.getSort() != null || partialQuery.getSortProject() != null) {
+      corrQuery = corrQuery.withSort(partialQuery.getSort());
+    }
+    return corrQuery;
+  }
+

Review Comment:
   The DruidCorrelateUnnestRule already put the select project on top of the correlateRel by adding all the things in right to the left. Also the group by, order by on the outer query are already present on the correlateRel. So we create the query by adding the select project (the one which is already projected on top with left + right), the filters from the left + aggregates , sorts if present on the top level



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13799:
URL: https://github.com/apache/druid/pull/13799#issuecomment-1449047935

   >It seems like you could potentially decompose the filter to partially push down anything that is is not filtering on the output column name into the base cursor so that the query can still use indexes and such, since otherwise the implication here is that if any filter is on the unnest column then it means we aren't going to use any indexes since we aren't pushing filter down to the base cursor.
   
   Calcite is smart enough to push the filter on an existing dimension to the underlying datasource. For example a query like
   ```with t as (select * from druid.numfoo,UNNEST(MV_TO_ARRAY(dim3)) as foo(d3))
   select d3 from t where d3 >1 AND dim2 IN ('a','b')```
   
   Calcite plans it as
   ```
   157:LogicalProject(d3=[$17])
     155:LogicalCorrelate(subset=[rel#156:Subset#7.NONE.[]], correlation=[$cor0], joinType=[inner], requiredColumns=[{3}])
       144:LogicalFilter(subset=[rel#145:Subset#1.NONE.[]], condition=[OR(=($2, 'a'), =($2, 'b'))])
         9:LogicalTableScan(subset=[rel#143:Subset#0.NONE.[]], table=[[druid, numfoo]])
       151:LogicalFilter(subset=[rel#152:Subset#5.NONE.[]], condition=[>(CAST($0):INTEGER, 1)])
         149:Uncollect(subset=[rel#150:Subset#4.NONE.[]])
           147:LogicalProject(subset=[rel#148:Subset#3.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
             10:LogicalValues(subset=[rel#146:Subset#2.NONE.[0]], tuples=[[{ 0 }]])
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13799:
URL: https://github.com/apache/druid/pull/13799#issuecomment-1454891382

   The following things have been addressed:
   
   1. Filter moved inside unnest data source
   2. A transformation is introduced that creates the partial query on the correlate from the partial query on the left
   3. We only create a query data source if there is a LIMIT as the limit needs to be honored before unnest
   4. Updated test cases.
   
   Things to be done:
   
   1. Add more tests
   2. Modify getValueCardinality for the dimension cursor
   3. Transform the filter on the unnested output column to filters on the left and pass them to the base cursor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on PR #13799:
URL: https://github.com/apache/druid/pull/13799#issuecomment-1458099866

   @somu-imply could you please have a look at https://github.com/apache/druid/pull/13892? That has some changes that I started doing as part of looking into upgrading Calcite. The upgrade borked a bunch of the existing UNNEST tests and exposed some issues with how the planning works, which led to invalid plans. The changes in the #13892 should fix up the planning. IMO the logic is also cleaner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1125562923


##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/PartialDruidQuery.java:
##########
@@ -83,7 +83,8 @@
     // WINDOW may be present only together with SCAN.
     WINDOW,
 
-    UNNEST_PROJECT
+    UNNEST_PROJECT,
+    UNNEST_FILTER

Review Comment:
   This needs to be removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1116363049


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun
+
+    final Filter forBaseFilter;
+    if (filter == null) {
+      forBaseFilter = filter;
     } else {
-      updatedFilter = filter;
+      // if the filter has the unnested column
+      // do not pass it into the base cursor
+      // if there is a filter as d2 > 1 and unnest-d2 < 10
+      // Calcite would push the filter on d2 into the data source
+      // and only the filter on unnest-d2 < 10 will appear here
+      forBaseFilter = filter.getRequiredColumns().contains(outputColumnName) ? null : filter;

Review Comment:
   I'll add more comment. d2 is the column to be unnested. What I mean is that all filters on columns that exist in the datasource are pushed on to the data source. Only the filters on the unnested column appears here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1116359795


##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -243,9 +245,23 @@ public void advance()
   @Override
   public void advanceUninterruptibly()
   {
-    do {
+    // the index currently points to an element at position i ($e_i)
+    // while $e_i does not match the filter
+    // keep advancing the pointer to the first valid match
+    // calling advanceAndUpdate increments the index position so needs a call to matches() again for new match status
+    boolean match = valueMatcher.matches();
+    if (match) {
       advanceAndUpdate();
-    } while (matchAndProceed());
+      match = valueMatcher.matches();
+    }
+    while (!match) {
+      if (!baseCursor.isDone()) {
+        advanceAndUpdate();
+        match = valueMatcher.matches();
+      } else {
+        return;
+      }
+    }

Review Comment:
   Thanks this should be 
   ```
   while (true) {
         advanceAndUpdate();
         boolean match = valueMatcher.matches();
         if (match || baseCursor.isDone()) {
           return;
         }
       }
   ```
   The match changes after the update so the call should be below. I'll add it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #13799:
URL: https://github.com/apache/druid/pull/13799#issuecomment-1482049124

   Closed in favor of #13934 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org