You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "imply-cheddar (via GitHub)" <gi...@apache.org> on 2023/03/15 07:59:46 UTC
[GitHub] [druid] imply-cheddar commented on a diff in pull request #13934: Unnest changes for moving the filter on right side of correlate to inside the unnest datasource

imply-cheddar commented on code in PR #13934:
URL: https://github.com/apache/druid/pull/13934#discussion_r1136322477


##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -274,15 +274,30 @@ private Pair<Filter, Filter> computeBaseAndPostCorrelateFilters(
       final List<Filter> preFilters = new ArrayList<>();
       final List<Filter> postFilters = new ArrayList<>();
 
-      void add(@Nullable final Filter filter)
+      void addPostFilterWithPreFilterIfRewritePossible(@Nullable final Filter filter)
+      {
+        if (filter == null) {
+          return;
+        }
+        final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
+        if (newFilter != null) {
+          // Add the rewritten filter pre-unnest, so we get the benefit of any indexes, and so we avoid unnesting
+          // any rows that do not match this filter at all.
+          preFilters.add(newFilter);
+        }
+        // Add original filter post-unnest no matter what: we need to filter out any extraneous unnested values.
+        postFilters.add(filter);
+      }
+
+      void addPreFilter(@Nullable final Filter filter)
       {
         if (filter == null) {
           return;
         }
 
         final Set<String> requiredColumns = filter.getRequiredColumns();
 
-        // Run filter post-correlate if it refers to any virtual columns.
+        // Run filter post-unnest if it refers to any virtual columns.

Review Comment:
   Make this
   
   > // Run filter post-unnest if it refers to any virtual columns.  This is a conservative judgement call
   > // that perhaps forces the code to use a ValueMatcher where an index would've been available,
   > // which can have real performance implications.  This is an interim choice made to value correctness
   > // over performance.  When we need to optimize this performance, we should be able to
   > // create a VirtualColumnDatasource that contains all of the virtual columns, in which case the query
   > // itself would stop carrying them and everything should be able to be pushed down.
   



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -291,34 +306,26 @@ void add(@Nullable final Filter filter)
             }
           }
         }
-
-        if (requiredColumns.contains(outputColumnName)) {
-          // Try to move filter pre-correlate if possible.
-          final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
-          if (newFilter != null) {
-            preFilters.add(newFilter);
-          } else {
-            postFilters.add(filter);
-          }
-        } else {
-          preFilters.add(filter);
-        }
+        preFilters.add(filter);
       }
     }
 
     final FilterSplitter filterSplitter = new FilterSplitter();
 
-    if (allowSet != null && !allowSet.isEmpty()) {
-      // Filter on input column if possible (it may be faster); otherwise use output column.
-      filterSplitter.add(new InDimFilter(inputColumn != null ? inputColumn : outputColumnName, allowSet));
-    }
-
     if (queryFilter instanceof AndFilter) {
       for (Filter filter : ((AndFilter) queryFilter).getFilters()) {
-        filterSplitter.add(filter);
+        filterSplitter.addPreFilter(filter);
+      }
+    } else {
+      filterSplitter.addPreFilter(queryFilter);
+    }
+
+    if (unnestFilter instanceof AndFilter) {
+      for (Filter filter : ((AndFilter) unnestFilter).getFilters()) {
+        filterSplitter.addPostFilterWithPreFilterIfRewritePossible(filter);
       }
     } else {
-      filterSplitter.add(queryFilter);
+      filterSplitter.addPostFilterWithPreFilterIfRewritePossible(unnestFilter);
     }

Review Comment:
   Is there a specific reason that we are breaking up the unnestFilter here?  Or is this a reflection of a C&P of the logic for the normal filters?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -285,31 +311,53 @@ void add(@Nullable final Filter filter)
             }
           }
         }
+        preFilters.add(filter);
+
+      }
+    }
 
-        if (requiredColumns.contains(outputColumnName)) {
-          // Rewrite filter post-unnest if possible.
+    final FilterSplitter filterSplitter = new FilterSplitter();
+
+    // non-unnest case
+    // OR CASE
+    List<Filter> preFilterList = new ArrayList<>();
+    List<Filter> postFilterList = new ArrayList<>();

Review Comment:
   The `preFilterList`, `postFilterList` here is very confusing and easy to mix up with the `filterSplitter`...  I'm not sure how to separate it exactly, but I almost thought that these were getting AND'd together because I confused them with the FilterSplitter filters.



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -285,31 +311,53 @@ void add(@Nullable final Filter filter)
             }
           }
         }
+        preFilters.add(filter);
+
+      }
+    }
 
-        if (requiredColumns.contains(outputColumnName)) {
-          // Rewrite filter post-unnest if possible.
+    final FilterSplitter filterSplitter = new FilterSplitter();
+
+    // non-unnest case
+    // OR CASE
+    List<Filter> preFilterList = new ArrayList<>();
+    List<Filter> postFilterList = new ArrayList<>();
+    if (queryFilter instanceof OrFilter) {
+      for (Filter filter : ((OrFilter) queryFilter).getFilters()) {
+        if (filter.getRequiredColumns().contains(outputColumnName)) {
           final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
           if (newFilter != null) {

Review Comment:
   you are doing an `if/else` with a negative condition.  Please invert it.  If you are doing an `if/else` please always make the condition carry as few negations as possible.



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -285,31 +311,53 @@ void add(@Nullable final Filter filter)
             }
           }
         }
+        preFilters.add(filter);
+
+      }
+    }
 
-        if (requiredColumns.contains(outputColumnName)) {
-          // Rewrite filter post-unnest if possible.
+    final FilterSplitter filterSplitter = new FilterSplitter();
+
+    // non-unnest case
+    // OR CASE
+    List<Filter> preFilterList = new ArrayList<>();
+    List<Filter> postFilterList = new ArrayList<>();
+    if (queryFilter instanceof OrFilter) {
+      for (Filter filter : ((OrFilter) queryFilter).getFilters()) {
+        if (filter.getRequiredColumns().contains(outputColumnName)) {
           final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
           if (newFilter != null) {
-            // Add the rewritten filter pre-unnest, so we get the benefit of any indexes, and so we avoid unnesting
-            // any rows that do not match this filter at all.
-            preFilters.add(newFilter);
+            preFilterList.add(newFilter);
+            postFilterList.add(newFilter);

Review Comment:
   postFilter should be the original filter, I think?



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -285,31 +311,53 @@ void add(@Nullable final Filter filter)
             }
           }
         }
+        preFilters.add(filter);
+
+      }
+    }
 
-        if (requiredColumns.contains(outputColumnName)) {
-          // Rewrite filter post-unnest if possible.
+    final FilterSplitter filterSplitter = new FilterSplitter();
+
+    // non-unnest case
+    // OR CASE
+    List<Filter> preFilterList = new ArrayList<>();
+    List<Filter> postFilterList = new ArrayList<>();
+    if (queryFilter instanceof OrFilter) {
+      for (Filter filter : ((OrFilter) queryFilter).getFilters()) {
+        if (filter.getRequiredColumns().contains(outputColumnName)) {
           final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
           if (newFilter != null) {
-            // Add the rewritten filter pre-unnest, so we get the benefit of any indexes, and so we avoid unnesting
-            // any rows that do not match this filter at all.
-            preFilters.add(newFilter);
+            preFilterList.add(newFilter);
+            postFilterList.add(newFilter);
+          } else {
+            postFilterList.add(filter);

Review Comment:
   If it's not re-writable, then I don't think the OR can be pushed down at all.  If you only pushdown the things that don't refer to the output of the unnest, you run the risk of not including some rows that should've been included.  It should be relatively easy to test for this.  If there's a query with an `MV_TO_STRING(unnested_dim, 'a') = 'b' OR other_dim = 'a'`, I don't think it'll believe that it can remap it.  If it pushes down the `other_dim = a` without attempting to push down the first part of the OR, it will not include a row where `nested_dim` is `["b"]` and `other_dim` is `"c"`.



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -285,31 +311,53 @@ void add(@Nullable final Filter filter)
             }
           }
         }
+        preFilters.add(filter);
+
+      }
+    }
 
-        if (requiredColumns.contains(outputColumnName)) {
-          // Rewrite filter post-unnest if possible.
+    final FilterSplitter filterSplitter = new FilterSplitter();
+
+    // non-unnest case
+    // OR CASE
+    List<Filter> preFilterList = new ArrayList<>();
+    List<Filter> postFilterList = new ArrayList<>();
+    if (queryFilter instanceof OrFilter) {
+      for (Filter filter : ((OrFilter) queryFilter).getFilters()) {
+        if (filter.getRequiredColumns().contains(outputColumnName)) {
           final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
           if (newFilter != null) {
-            // Add the rewritten filter pre-unnest, so we get the benefit of any indexes, and so we avoid unnesting
-            // any rows that do not match this filter at all.
-            preFilters.add(newFilter);
+            preFilterList.add(newFilter);
+            postFilterList.add(newFilter);
+          } else {
+            postFilterList.add(filter);
           }
-          // Add original filter post-unnest no matter what: we need to filter out any extraneous unnested values.
-          postFilters.add(filter);
         } else {
-          preFilters.add(filter);
+          preFilterList.add(filter);
         }
       }
     }
+    if (!preFilterList.isEmpty()) {

Review Comment:
   Anotehr `if/else` on negative logic.  Keep it positive please.



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -285,31 +311,53 @@ void add(@Nullable final Filter filter)
             }
           }
         }
+        preFilters.add(filter);
+
+      }
+    }
 
-        if (requiredColumns.contains(outputColumnName)) {
-          // Rewrite filter post-unnest if possible.
+    final FilterSplitter filterSplitter = new FilterSplitter();
+
+    // non-unnest case
+    // OR CASE
+    List<Filter> preFilterList = new ArrayList<>();
+    List<Filter> postFilterList = new ArrayList<>();
+    if (queryFilter instanceof OrFilter) {
+      for (Filter filter : ((OrFilter) queryFilter).getFilters()) {
+        if (filter.getRequiredColumns().contains(outputColumnName)) {
           final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
           if (newFilter != null) {
-            // Add the rewritten filter pre-unnest, so we get the benefit of any indexes, and so we avoid unnesting
-            // any rows that do not match this filter at all.
-            preFilters.add(newFilter);
+            preFilterList.add(newFilter);
+            postFilterList.add(newFilter);
+          } else {
+            postFilterList.add(filter);
           }
-          // Add original filter post-unnest no matter what: we need to filter out any extraneous unnested values.
-          postFilters.add(filter);
         } else {
-          preFilters.add(filter);
+          preFilterList.add(filter);
         }
       }
     }
+    if (!preFilterList.isEmpty()) {
+      OrFilter orFilter = new OrFilter(preFilterList);
+      filterSplitter.addPreFilter(orFilter);
+    } else {
+      filterSplitter.addPreFilter(queryFilter);
+    }
+
+    if (!postFilterList.isEmpty()) {
+      AndFilter andFilter = new AndFilter(postFilterList);

Review Comment:
   Why are these AND'd?  Didn't they come from an OR?



##########
processing/src/main/java/org/apache/druid/query/UnnestDataSource.java:
##########
@@ -172,14 +191,18 @@ public boolean equals(Object o)
       return false;
     }
     UnnestDataSource that = (UnnestDataSource) o;
-    return virtualColumn.equals(that.virtualColumn)
-           && base.equals(that.base);
+    if (unnestFilter != null) {
+      return virtualColumn.equals(that.virtualColumn) && unnestFilter.equals(that.unnestFilter)
+             && base.equals(that.base);
+    } else {
+      return virtualColumn.equals(that.virtualColumn) && base.equals(that.base);
+    }

Review Comment:
   This is an odd equals implementation.  If the current object's `unnestFilter` is null, then the `that` object can have any unnest filter it wants?
   
   Generally speaking, I suggest letting IntelliJ generate your equals and hashCode.  (I'm assuming that perhaps you adjusted it yourself?)



##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/CorrelateFilterLTransposeRule.java:
##########
@@ -30,7 +30,7 @@
  * Rule that pulls a {@link Filter} from the left-hand side of a {@link Correlate} above the Correlate.
  * Allows subquery elimination.
  *
- * @see CorrelateFilterRTransposeRule similar, but for right-hand side filters
+ * @see DruidFilterUnnestRule similar, but for right-hand side filters

Review Comment:
   Is this comment legitimate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org