You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "paul-rogers (via GitHub)" <gi...@apache.org> on 2023/02/23 01:18:32 UTC

[GitHub] [druid] paul-rogers commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

paul-rogers commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1115145381


##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -243,9 +245,23 @@ public void advance()
   @Override
   public void advanceUninterruptibly()
   {
-    do {
+    // the index currently points to an element at position i ($e_i)
+    // while $e_i does not match the filter
+    // keep advancing the pointer to the first valid match
+    // calling advanceAndUpdate increments the index position so needs a call to matches() again for new match status
+    boolean match = valueMatcher.matches();
+    if (match) {
       advanceAndUpdate();
-    } while (matchAndProceed());
+      match = valueMatcher.matches();
+    }
+    while (!match) {
+      if (!baseCursor.isDone()) {
+        advanceAndUpdate();
+        match = valueMatcher.matches();
+      } else {
+        return;
+      }
+    }

Review Comment:
   Suggestion:
   
   ```java
     while (true) {
       boolean match = valueMatcher.matches();
       advanceAndUpdate();
       if (match || baseCursor.isDone()) {
           return;
       }
     } 
   ```



##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -311,12 +327,11 @@ private void initialize()
   {
     this.unnestListForCurrentRow = new ArrayList<>();
     getNextRow(needInitialization);
-    if (allowSet != null) {
-      if (!allowSet.isEmpty()) {
-        if (!allowSet.contains((String) unnestListForCurrentRow.get(index))) {
-          advance();
-        }
-      }
+
+    // If the first value the index is pointing to does not match the filter
+    // advance the index to the first value which will match
+    if (!valueMatcher.matches()) {
+      advance();

Review Comment:
   Redundant. `advanceUninterruptibly()` already does the needed checks.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -79,31 +81,32 @@ public class UnnestDimensionCursor implements Cursor
   private final DimensionSelector dimSelector;
   private final String columnName;
   private final String outputName;
-  private final LinkedHashSet<String> allowSet;
-  private final BitSet allowedBitSet;
   private final ColumnSelectorFactory baseColumnSelectorFactory;
-  private int index;
-  @Nullable private IndexedInts indexedIntsForCurrentRow;
+  @Nullable
+  private final Filter allowFilter;
+  private int indexForRow;
+  @Nullable
+  private IndexedInts indexedIntsForCurrentRow;
   private boolean needInitialization;
   private SingleIndexInts indexIntsForRow;
+  private ValueMatcher valueMatcher;
 
   public UnnestDimensionCursor(
       Cursor cursor,
       ColumnSelectorFactory baseColumnSelectorFactory,
       String columnName,
       String outputColumnName,
-      LinkedHashSet<String> allowSet
+      @Nullable Filter allowFilter
   )
   {
     this.baseCursor = cursor;
     this.baseColumnSelectorFactory = baseColumnSelectorFactory;
     this.dimSelector = this.baseColumnSelectorFactory.makeDimensionSelector(DefaultDimensionSpec.of(columnName));
     this.columnName = columnName;
-    this.index = 0;
+    this.indexForRow = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
-    this.allowedBitSet = new BitSet();
+    this.allowFilter = allowFilter;

Review Comment:
   Here we allow a null filter. In the code above, we provided an "accept all" filter in this case. Should we do so here to avoid if null checks?



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -308,7 +321,7 @@ public boolean isDoneOrInterrupted()
   @Override
   public void reset()
   {
-    index = 0;
+    indexForRow = 0;

Review Comment:
   Nit: is this the index of a row? If so, then maybe `rowIndex` would be clearer.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -322,22 +335,19 @@ public void reset()
   @Nullable
   private void initialize()
   {
-    IdLookup idLookup = dimSelector.idLookup();
-    this.indexIntsForRow = new SingleIndexInts();
-    if (allowSet != null && !allowSet.isEmpty() && idLookup != null) {
-      for (String s : allowSet) {
-        if (idLookup.lookupId(s) >= 0) {
-          allowedBitSet.set(idLookup.lookupId(s));
-        }
-      }
+    indexForRow = 0;
+    if (allowFilter != null) {
+      this.valueMatcher = allowFilter.makeMatcher(this.getColumnSelectorFactory());
+    } else {
+      this.valueMatcher = BooleanValueMatcher.of(true);
     }
+    this.indexIntsForRow = new SingleIndexInts();
+
     if (dimSelector.getObject() != null) {
       this.indexedIntsForCurrentRow = dimSelector.getRow();
     }
-    if (!allowedBitSet.isEmpty()) {
-      if (!allowedBitSet.get(indexedIntsForCurrentRow.get(index))) {
-        advance();
-      }
+    if (!valueMatcher.matches() && !baseCursor.isDone()) {
+      advance();

Review Comment:
   This is odd. Is the goal of `initialize()` to move to the first row? We treat first rows differently than all other rows? I can't just do `while(next())` but have to special-case the first row?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun

Review Comment:
   Should the "currently" and "TBD" issues be addressed before merging?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -107,15 +105,15 @@ public Sequence<Cursor> makeCursors(
                   retVal.getColumnSelectorFactory(),
                   dimensionToUnnest,
                   outputColumnName,
-                  allowSet
+                  filter

Review Comment:
   Above, we _may_ have passed the filter into the "base". If we did, do we want to reapply the same filter here? Or, should the above code actually be:
   
   ```java
       final Filter forBaseFilter;
       final Filter localFilter;
       if (filter == null || filter.getRequiredColumns().contains(outputColumnName) ) {
         forBaseFilter = filter;
         localFilter = null;
       } else {
         forBaseFilter = filter;
         localFilter = null;
      }
   ```
   
   Then pass `localFilter` where we're passing `filter`?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun
+
+    final Filter forBaseFilter;
+    if (filter == null) {
+      forBaseFilter = filter;
     } else {
-      updatedFilter = filter;
+      // if the filter has the unnested column
+      // do not pass it into the base cursor
+      // if there is a filter as d2 > 1 and unnest-d2 < 10
+      // Calcite would push the filter on d2 into the data source
+      // and only the filter on unnest-d2 < 10 will appear here
+      forBaseFilter = filter.getRequiredColumns().contains(outputColumnName) ? null : filter;

Review Comment:
   This code seems to assume a single filter. In the example, `unnest-d2 < 10`. But, what if my condition is `unnest-d2 < 10 AND unnest_d2 > 5`. Will this code work? And if only the `unnest-d2` appears here, do we need to check the output column name?
   
   To help understand this, it would be good to give a bit more of the query. What, exactly, is `d2` here?



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -203,8 +202,10 @@ public Class<?> classOfObject()
           @Override
           public int getValueCardinality()
           {
-            if (!allowedBitSet.isEmpty()) {
-              return allowedBitSet.cardinality();
+            if (allowFilter instanceof InDimFilter) {
+              return ((InDimFilter) allowFilter).getValues().size();
+            } else if (allowFilter instanceof AndFilter) {
+              return ((AndFilter) allowFilter).getFilters().size();

Review Comment:
   Is this right? For an `in` filter, we know that the value has to be one of those in the list. But, for `and`, I can have anything can't I? `x > 0 AND x < 1000` would have a cardinality of 1000 for a `long` type.



##########
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java:
##########
@@ -54,7 +51,7 @@ public void test_list_unnest_cursors()
         listCursor.getColumnSelectorFactory(),
         "dummy",
         OUTPUT_NAME,
-        IGNORE_SET
+        null

Review Comment:
   Do these tests a wide range of predicates (filters) on the unnested value? I worry that the code above handles only simple cases: `unnested_value op constant`. Do they also handle complex cases such as `fn(unnested_value) > some_other_col`? `fn(unnested_value) > 10 AND unnested_value * 3 > 12`? Would be good to have tests that exercise these non-trivial cases.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -351,43 +361,25 @@ private void initialize()
   private void advanceAndUpdate()
   {
     if (indexedIntsForCurrentRow == null) {
-      index = 0;
+      indexForRow = 0;
       if (!baseCursor.isDone()) {
         baseCursor.advanceUninterruptibly();
       }
     } else {
-      if (index >= indexedIntsForCurrentRow.size() - 1) {
+      if (indexForRow >= indexedIntsForCurrentRow.size() - 1) {

Review Comment:
   As a reader, `indexedIntsForCurrentRow` doesn't seem to tell me anything. What is an "indexed int"? That is, what does it do? Are these the rows that have values for the current row? Something else? Can we come up with a name that might help us newbies?



##########
processing/src/test/java/org/apache/druid/segment/UnnestColumnValueSelectorCursorTest.java:
##########
@@ -387,44 +384,6 @@ public void test_list_unnest_cursors_user_supplied_list_with_dups()
     Assert.assertEquals(k, 10);
   }
 
-  @Test

Review Comment:
   Should these tests be retained, but updated to whatever the new format is? Presumably these addressed some earlier concern, and we should ensure that they still work. Or, are there new tests that cover the same cases that the deleted tests cover?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -73,20 +67,24 @@ public Sequence<Cursor> makeCursors(
       @Nullable QueryMetrics<?> queryMetrics
   )
   {
-    Filter updatedFilter;
-    if (allowSet != null && !allowSet.isEmpty()) {
-      final InDimFilter allowListFilters;
-      allowListFilters = new InDimFilter(dimensionToUnnest, allowSet);
-      if (filter != null) {
-        updatedFilter = new AndFilter(Arrays.asList(filter, allowListFilters));
-      } else {
-        updatedFilter = allowListFilters;
-      }
+    // the filter on the outer unnested column needs to be recreated on the unnested dimension and sent into
+    // the base cursor
+    // currently testing it  out for in filter and selector filter
+    // TBD: boun
+
+    final Filter forBaseFilter;
+    if (filter == null) {
+      forBaseFilter = filter;
     } else {
-      updatedFilter = filter;
+      // if the filter has the unnested column
+      // do not pass it into the base cursor
+      // if there is a filter as d2 > 1 and unnest-d2 < 10
+      // Calcite would push the filter on d2 into the data source
+      // and only the filter on unnest-d2 < 10 will appear here
+      forBaseFilter = filter.getRequiredColumns().contains(outputColumnName) ? null : filter;

Review Comment:
   Nit:
   
   ```java
   final Filter forBaseFilter = filter == null || filter.getRequiredColumns().contains(outputColumnName) ? null : filter;
   ```



##########
processing/src/test/java/org/apache/druid/query/groupby/UnnestGroupByQueryRunnerTest.java:
##########
@@ -699,8 +696,7 @@ public void testGroupByOnUnnestedVirtualMultiColumn()
     final DataSource unnestDataSource = UnnestDataSource.create(
         new TableDataSource(QueryRunnerTestHelper.DATA_SOURCE),
         "vc",
-        QueryRunnerTestHelper.PLACEMENTISH_DIMENSION_UNNEST,
-        null
+        QueryRunnerTestHelper.PLACEMENTISH_DIMENSION_UNNEST

Review Comment:
   Sorry, what is a `PLACEMENTISH`?



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -282,9 +283,21 @@ public void advance()
   @Override
   public void advanceUninterruptibly()
   {
-    do {
-      advanceAndUpdate();
-    } while (matchAndProceed());
+    if (!baseCursor.isDone()) {
+      boolean match = valueMatcher.matches();
+      if (match) {
+        advanceAndUpdate();
+        match = valueMatcher.matches();
+      }
+      while (!match) {
+        if (!baseCursor.isDone()) {
+          advanceAndUpdate();
+          match = valueMatcher.matches();
+        } else {
+          return;
+        }
+      }
+    }

Review Comment:
   See suggestion above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org