You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/17 19:01:04 UTC

[GitHub] [druid] jon-wei opened a new pull request #9373: Fix join filter push down post-join virtual column handling

jon-wei opened a new pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373
 
 
   This PR fixes an issue in JoinFilterAnalyzer where post-join virtual columns were incorrectly identified as belonging to the base table column set, leading to filters on such virtual columns being pushed down incorrectly.
   
   The PR fixes the issue by passing in the set of base table columns (including any pre-join virtual columns) to JoinFilterAnalyzer.splitFilter instead of checking prefixes.
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381520134
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -234,13 +235,17 @@ public Metadata getMetadata()
       // Virtual columns cannot depend on each other, so we don't need to check transitive dependencies.
       if (baseColumns.containsAll(virtualColumn.requiredColumns())) {
         preJoinVirtualColumns.add(virtualColumn);
+        // Since pre-join virtual columns can be computed using only base columns, we include them in the
+        // base column set.
+        baseColumns.add(virtualColumn.getOutputName());
 
 Review comment:
   This case is covered by `JoinFilterAnalyzerTest.test_filterPushDown_factToRegionToCountryLeftFilterOnChannelVirtualColumn`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381552442
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -297,6 +302,22 @@ public boolean isEnableFilterPushDown()
     return enableFilterPushDown;
   }
 
+  @VisibleForTesting
 
 Review comment:
   I collapsed this into a shared method with the block in makeCursors and removed `VisibleForTesting`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381552230
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -297,6 +302,22 @@ public boolean isEnableFilterPushDown()
     return enableFilterPushDown;
   }
 
+  @VisibleForTesting
+  public Set<String> getAdapterBaseColumnNamesWithVirtualColumns(VirtualColumns virtualColumns)
+  {
+    final Set<String> baseColumns = new HashSet<>();
+    Iterables.addAll(baseColumns, baseAdapter.getAvailableDimensions());
+    Iterables.addAll(baseColumns, baseAdapter.getAvailableMetrics());
+
+    for (VirtualColumn virtualColumn : virtualColumns.getVirtualColumns()) {
+      if (baseColumns.containsAll(virtualColumn.requiredColumns())) {
 
 Review comment:
   I initially wanted it to provide different functionality (where the non-testing version also adds virtual columns to the prejoin/postjoin lists), but I collapsed them into one method

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei merged pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
jon-wei merged pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381437918
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -297,6 +302,22 @@ public boolean isEnableFilterPushDown()
     return enableFilterPushDown;
   }
 
+  @VisibleForTesting
+  public Set<String> getAdapterBaseColumnNamesWithVirtualColumns(VirtualColumns virtualColumns)
+  {
+    final Set<String> baseColumns = new HashSet<>();
+    Iterables.addAll(baseColumns, baseAdapter.getAvailableDimensions());
+    Iterables.addAll(baseColumns, baseAdapter.getAvailableMetrics());
+
+    for (VirtualColumn virtualColumn : virtualColumns.getVirtualColumns()) {
+      if (baseColumns.containsAll(virtualColumn.requiredColumns())) {
 
 Review comment:
   This is very similar to the code above in makeCursors(...) Perhaps move this into a utility function so they stay in sync. Or do you want this function to provide different functionality?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381441892
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -234,13 +235,17 @@ public Metadata getMetadata()
       // Virtual columns cannot depend on each other, so we don't need to check transitive dependencies.
       if (baseColumns.containsAll(virtualColumn.requiredColumns())) {
 
 Review comment:
   what happens if one of the required columns is a virtual column?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381439414
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -72,8 +72,21 @@
   private static final String PUSH_DOWN_VIRTUAL_COLUMN_NAME_BASE = "JOIN-FILTER-PUSHDOWN-VIRTUAL-COLUMN-";
   private static final ColumnSelectorFactory ALL_NULL_COLUMN_SELECTOR_FACTORY = new AllNullColumnSelectorFactory();
 
+  /**
+   * Analyze a filter and return a JoinFilterSplit indicating what parts of the filter should be applied pre-join
+   * and post-join.
+   *
+   * @param hashJoinSegmentStorageAdapter The storage adapter that is being queried
+   * @param baseColumnNames               Set of names of columns that belong to the base table,
+   *                                      including pre-join virtual columns
+   * @param originalFilter                Original filter from the query
+   * @param enableFilterPushDown          Whether to enable filter push down
+   * @return A JoinFilterSplit indicating what parts of the filter should be applied pre-join
+   *         and post-join.
+   */
 
 Review comment:
   👍 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381552911
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -234,13 +235,17 @@ public Metadata getMetadata()
       // Virtual columns cannot depend on each other, so we don't need to check transitive dependencies.
       if (baseColumns.containsAll(virtualColumn.requiredColumns())) {
 
 Review comment:
   Virtual columns currently can't use other virtual columns as input (they're each constructed independently from the base columns), if you tried I think it would just operate as if you referenced a non-existent column

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381443071
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -538,18 +562,18 @@ private static void getCorrelationForRHSColumn(
         // We push down if the function only requires base table columns
         Expr.BindingDetails bindingDetails = lhsExpr.analyzeInputs();
         Set<String> requiredBindings = bindingDetails.getRequiredBindings();
-        if (!requiredBindings.stream().allMatch(requiredBinding -> adapter.isBaseColumn(requiredBinding))) {
+        if (!requiredBindings.stream().allMatch(requiredBinding -> baseColumnNames.contains(requiredBinding))) {
 
 Review comment:
   ```suggestion
           if (!baseColumnNames.containsAll(requiredBindings)) {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381434255
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -234,13 +235,17 @@ public Metadata getMetadata()
       // Virtual columns cannot depend on each other, so we don't need to check transitive dependencies.
       if (baseColumns.containsAll(virtualColumn.requiredColumns())) {
         preJoinVirtualColumns.add(virtualColumn);
+        // Since pre-join virtual columns can be computed using only base columns, we include them in the
+        // base column set.
+        baseColumns.add(virtualColumn.getOutputName());
 
 Review comment:
   missing unit tests for this in `HashJoinSegmentStorageAdapterTest`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9373: Fix join filter push down post-join virtual column handling
URL: https://github.com/apache/druid/pull/9373#discussion_r381439193
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -297,6 +302,22 @@ public boolean isEnableFilterPushDown()
     return enableFilterPushDown;
   }
 
+  @VisibleForTesting
 
 Review comment:
   nit: public functions marked `VisibleForTesting` are hard to maintain, because people can accidentally call this function from another package and there's nothing that warns them they're doing something wrong.
   
   I also noticed this function is only called in a test, can we refactor this so that we don't need a public VisibleForTesting function

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org