You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "gianm (via GitHub)" <gi...@apache.org> on 2023/03/11 00:22:56 UTC

[GitHub] [druid] gianm commented on a diff in pull request #13922: Refactoring and bug fixes on top of unnest. The filter now is passed …

gianm commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132987409


##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -310,12 +314,15 @@ private void getNextRow()
   private void initialize()

Review Comment:
   The javadoc on this method seems out of date (it refers to `allowList`).



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -293,13 +293,16 @@ void add(@Nullable final Filter filter)
         }
 
         if (requiredColumns.contains(outputColumnName)) {
-          // Try to move filter pre-correlate if possible.
+          // Rewrite filter post-unnest if possible.
           final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
           if (newFilter != null) {
+            // Add the rewritten filter pre-unnest, so we get the benefit of any indexes, and so we avoid unnesting
+            // any rows that do not match this filter at all.
             preFilters.add(newFilter);
-          } else {
-            postFilters.add(filter);
           }
+          // This is needed as a filter on an MV String Dimension returns the entire row matching the filter

Review Comment:
   fwiw, it's not just about MV string columns. When we support doing this for arrays, the same thing applies to arrays. The pre-unnest filter is an `array_contains` and the post-unnest filter is a regular `=`.



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   This isn't right, since post-unnest filters may refer to virtual columns. So anything from `filterPair.rhs` that refers to a virtual column needs to be placed here, or else it won't properly see them.
   
   What is the rationale for moving the filter into the UnnestColumnValueSelectorCursor? It seems like the logic there is doing something similar to what PostJoinCursor does: creating a ValueMatcher that wraps the column selector factory. I wonder what's wrong with letting PostJoinCursor do that, which would keep the code in UnnestColumnValueSelectorCursor simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org