You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/03/14 00:31:51 UTC

[GitHub] [druid] jon-wei opened a new pull request #9516: More efficient join filter rewrites

jon-wei opened a new pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516
 
 
   This PR adjusts the join filter rewrite/pushdown logic in `JoinFilterAnalyzer` to avoid redundant computation/memory waste for filter analysis information that's common across segments (converting filters to conjunctive normal form, and determining + storing correlated values for filter rewrites). 
   
   A new `computeJoinFilterPreAnalysis` method has been added which handles the computations described above (called once per query on each node). The result of this method is passed to the `splitFilters` method (called once per segment).
   
   Two new query context parameters are added:
   - `enableJoinFilterRewriteValueColumnFilters` : Controls whether we rewrite RHS filters on non-key columns. False by default for performance reasons, since rewriting such filters requires a scan of the RHS table.
   - `joinFilterRewriteMaxSize`: Controls the maximum size of the correlated value set used for filter rewrites. This limit is place to prevent excessive memory use. The default limit is 10000.
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei merged pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei merged pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376287
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -635,4 +694,106 @@ private static boolean filterMatchesNull(Filter filter)
     ValueMatcher valueMatcher = filter.makeMatcher(ALL_NULL_COLUMN_SELECTOR_FACTORY);
     return valueMatcher.matches();
   }
+
+  private static JoinableClause isColumnFromJoin(
+      List<JoinableClause> joinableClauses,
+      String column
+  )
+  {
+    for (JoinableClause joinableClause : joinableClauses) {
+      if (joinableClause.includesColumn(column)) {
+        return joinableClause;
+      }
+    }
+
+    return null;
+  }
+
+  private static boolean isColumnFromPostJoinVirtualColumns(
+      List<VirtualColumn> postJoinVirtualColumns,
+      String column
+  )
+  {
+    for (VirtualColumn postJoinVirtualColumn : postJoinVirtualColumns) {
+      if (column.equals(postJoinVirtualColumn.getOutputName())) {
+        return true;
+      }
+    }
+    return false;
+  }
+
+  private static boolean areSomeColumnsFromJoin(
+      List<JoinableClause> joinableClauses,
+      Collection<String> columns
+  )
+  {
+    for (String column : columns) {
+      if (isColumnFromJoin(joinableClauses, column) != null) {
+        return true;
+      }
+    }
+    return false;
+  }
+
+  private static boolean areSomeColumnsFromPostJoinVirtualColumns(
+      List<VirtualColumn> postJoinVirtualColumns,
+      Collection<String> columns
+  )
+  {
+    for (String column : columns) {
+      if (isColumnFromPostJoinVirtualColumns(postJoinVirtualColumns, column)) {
+        return true;
+      }
+    }
+    return false;
+  }
+
+  private static void splitVirtualColumns(
+      List<JoinableClause> joinableClauses,
+      final VirtualColumns virtualColumns,
+      final List<VirtualColumn> preJoinVirtualColumns,
+      final List<VirtualColumn> postJoinVirtualColumns
+  )
+  {
+    for (VirtualColumn virtualColumn : virtualColumns.getVirtualColumns()) {
+      if (areSomeColumnsFromJoin(joinableClauses, virtualColumn.requiredColumns())) {
+        postJoinVirtualColumns.add(virtualColumn);
+      } else {
+        preJoinVirtualColumns.add(virtualColumn);
+      }
+    }
+  }
+
+  private static class RHSRewriteCandidate
 
 Review comment:
   Renamed to that casing

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393386193
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -311,124 +431,99 @@ private static JoinFilterAnalysis rewriteOrFilter(
     );
   }
 
-  /**
-   * Rewrites a selector filter on a join table into an IN filter on the base table.
-   *
-   * @param baseColumnNames  Set of names of columns that belong to the base table, including pre-join virtual
-   *                         columns
-   * @param selectorFilter   SelectorFilter to be rewritten
-   * @param prefixes         Map of join table prefixes to clauses
-   * @param equiconditions   Map of equiconditions
-   * @param correlationCache Cache of column correlation analyses. This will be potentially modified by adding
-   *                         any new column correlation analyses to the cache.
-   *
-   * @return A JoinFilterAnalysis that indicates how to handle the potentially rewritten filter
-   */
   private static JoinFilterAnalysis rewriteSelectorFilter(
-      Set<String> baseColumnNames,
       SelectorFilter selectorFilter,
-      Map<String, JoinableClause> prefixes,
-      Map<String, Set<Expr>> equiconditions,
-      Map<String, Optional<List<JoinFilterColumnCorrelationAnalysis>>> correlationCache
+      JoinFilterPreAnalysis joinFilterPreAnalysis
   )
   {
+
+    List<Filter> newFilters = new ArrayList<>();
+    List<VirtualColumn> pushdownVirtualColumns = new ArrayList<>();
+
     String filteringColumn = selectorFilter.getDimension();
-    for (Map.Entry<String, JoinableClause> prefixAndClause : prefixes.entrySet()) {
-      if (prefixAndClause.getValue().includesColumn(filteringColumn)) {
-        Optional<List<JoinFilterColumnCorrelationAnalysis>> correlations = correlationCache.computeIfAbsent(
-            prefixAndClause.getKey(),
-            p -> findCorrelatedBaseTableColumns(
-                baseColumnNames,
-                p,
-                prefixes.get(p),
-                equiconditions
-            )
-        );
+    String filteringValue = selectorFilter.getValue();
 
-        if (!correlations.isPresent()) {
-          return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
-        }
+    if (areSomeColumnsFromPostJoinVirtualColumns(
+        joinFilterPreAnalysis.getPostJoinVirtualColumns(),
+        selectorFilter.getRequiredColumns()
+    )) {
+      return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
+    }
 
-        List<Filter> newFilters = new ArrayList<>();
-        List<VirtualColumn> pushdownVirtualColumns = new ArrayList<>();
+    if (!areSomeColumnsFromJoin(joinFilterPreAnalysis.getJoinableClauses(), selectorFilter.getRequiredColumns())) {
+      return new JoinFilterAnalysis(
+          true,
+          selectorFilter,
+          selectorFilter,
+          pushdownVirtualColumns
+      );
+    }
 
-        for (JoinFilterColumnCorrelationAnalysis correlationAnalysis : correlations.get()) {
-          if (correlationAnalysis.supportsPushDown()) {
-            Set<String> correlatedValues = getCorrelatedValuesForPushDown(
-                selectorFilter.getDimension(),
-                selectorFilter.getValue(),
-                correlationAnalysis.getJoinColumn(),
-                prefixAndClause.getValue()
-            );
+    Optional<List<JoinFilterColumnCorrelationAnalysis>> correlationAnalyses = joinFilterPreAnalysis.getCorrelationsByFilteringColumn()
+                                                                                                   .get(filteringColumn);
 
-            if (correlatedValues.isEmpty()) {
-              return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
-            }
+    if (!correlationAnalyses.isPresent()) {
+      return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
+    }
 
-            for (String correlatedBaseColumn : correlationAnalysis.getBaseColumns()) {
-              Filter rewrittenFilter = new InDimFilter(
-                  correlatedBaseColumn,
-                  correlatedValues,
-                  null,
-                  null
-              ).toFilter();
-              newFilters.add(rewrittenFilter);
-            }
 
-            for (Expr correlatedBaseExpr : correlationAnalysis.getBaseExpressions()) {
-              // We need to create a virtual column for the expressions when pushing down.
-              // Note that this block is never entered right now, since correlationAnalysis.supportsPushDown()
-              // will return false if there any correlated expressions on the base table.
-              // Pushdown of such filters is disabled until the expressions system supports converting an expression
-              // into a String representation that can be reparsed into the same expression.
-              // https://github.com/apache/druid/issues/9326 tracks this expressions issue.
-              String vcName = getCorrelatedBaseExprVirtualColumnName(pushdownVirtualColumns.size());
-
-              VirtualColumn correlatedBaseExprVirtualColumn = new ExpressionVirtualColumn(
-                  vcName,
-                  correlatedBaseExpr,
-                  ValueType.STRING
-              );
-              pushdownVirtualColumns.add(correlatedBaseExprVirtualColumn);
-
-              Filter rewrittenFilter = new InDimFilter(
-                  vcName,
-                  correlatedValues,
-                  null,
-                  null
-              ).toFilter();
-              newFilters.add(rewrittenFilter);
-            }
-          }
-        }
+    for (JoinFilterColumnCorrelationAnalysis correlationAnalysis : correlationAnalyses.get()) {
+      if (correlationAnalysis.supportsPushDown()) {
+        Optional<Set<String>> correlatedValues = correlationAnalysis.getCorrelatedValuesMap().get(
+            Pair.of(filteringColumn, filteringValue)
+        );
 
-        if (newFilters.isEmpty()) {
+        if (!correlatedValues.isPresent()) {
           return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
         }
 
-        return new JoinFilterAnalysis(
-            true,
-            selectorFilter,
-            Filters.and(newFilters),
-            pushdownVirtualColumns
-        );
+        for (String correlatedBaseColumn : correlationAnalysis.getBaseColumns()) {
+          Filter rewrittenFilter = new InDimFilter(
+              correlatedBaseColumn,
+              correlatedValues.get(),
+              null,
+              null
+          ).toFilter();
+          newFilters.add(rewrittenFilter);
+        }
+
+        for (Expr correlatedBaseExpr : correlationAnalysis.getBaseExpressions()) {
+          // We need to create a virtual column for the expressions when pushing down.
+          // Note that this block is never entered right now, since correlationAnalysis.supportsPushDown()
+          // will return false if there any correlated expressions on the base table.
+          // Pushdown of such filters is disabled until the expressions system supports converting an expression
+          // into a String representation that can be reparsed into the same expression.
+          // https://github.com/apache/druid/issues/9326 tracks this expressions issue.
 
 Review comment:
   Actually, I changed my mind, I re-enabled filter rewrites when there are LHS expressions in the join condition, and removed the `@Ignore` annotations on the tests for that

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376315
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -602,20 +660,21 @@ private static void getCorrelationForRHSColumn(
   /**
    * Given a list of JoinFilterColumnCorrelationAnalysis, prune the list so that we only have one
    * JoinFilterColumnCorrelationAnalysis for each unique combination of base columns.
-   *
+   * <p>
 
 Review comment:
   Removed the <p> tags

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393264758
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -50,73 +51,126 @@
 /**
  * When there is a filter in a join query, we can sometimes improve performance by applying parts of the filter
  * when we first read from the base table instead of after the join.
- *
- * This class provides a {@link #splitFilter(HashJoinSegmentStorageAdapter, Set, Filter, boolean, boolean)} method that
- * takes a filter and splits it into a portion that should be applied to the base table prior to the join, and a
- * portion that should be applied after the join.
- *
+ * <p>
  * The first step of the filter splitting is to convert the filter into
  * https://en.wikipedia.org/wiki/Conjunctive_normal_form (an AND of ORs). This allows us to consider each
  * OR clause independently as a candidate for filter push down to the base table.
- *
+ * <p>
  * A filter clause can be pushed down if it meets one of the following conditions:
  * - The filter only applies to columns from the base table
  * - The filter applies to columns from the join table, and we determine that the filter can be rewritten
  *   into a filter on columns from the base table
- *
+ * <p>
  * For the second case, where we rewrite filter clauses, the rewritten clause can be less selective than the original,
  * so we preserve the original clause in the post-join filtering phase.
+ * <p>
+ * The starting point for join analysis is the {@link #computeJoinFilterPreAnalysis} method. This method should be
+ * called before performing any per-segment join query work. This method converts the query filter into
+ * conjunctive normal form, and splits the CNF clauses into a portion that only references base table columns and
+ * a portion that references join table columns. For the filter clauses that apply to join table columns, the
+ * pre-analysis step computes the information necessary for rewriting such filters into filters on base table columns.
+ * <p>
+ * The result of this pre-analysis method should be passed into the next step of join filter analysis, described below.
+ * <p>
+ * The {@link #splitFilter(JoinFilterPreAnalysis)} method takes the pre-analysis result and optionally applies the\
+ * filter rewrite and push down operations on a per-segment level.
  */
 public class JoinFilterAnalyzer
 {
   private static final String PUSH_DOWN_VIRTUAL_COLUMN_NAME_BASE = "JOIN-FILTER-PUSHDOWN-VIRTUAL-COLUMN-";
   private static final ColumnSelectorFactory ALL_NULL_COLUMN_SELECTOR_FACTORY = new AllNullColumnSelectorFactory();
 
   /**
-   * Analyze a filter and return a JoinFilterSplit indicating what parts of the filter should be applied pre-join
-   * and post-join.
+   * Before making per-segment filter splitting decisions, we first do a pre-analysis step
+   * where we convert the query filter (if any) into conjunctive normal form and then
+   * determine the structure of RHS filter rewrites (if any), since this information is shared across all
+   * per-segment operations.
    *
-   * @param hashJoinSegmentStorageAdapter The storage adapter that is being queried
-   * @param baseColumnNames               Set of names of columns that belong to the base table,
-   *                                      including pre-join virtual columns
-   * @param originalFilter                Original filter from the query
-   * @param enableFilterPushDown          Whether to enable filter push down
-   * @return A JoinFilterSplit indicating what parts of the filter should be applied pre-join
-   *         and post-join.
+   * See {@link JoinFilterPreAnalysis} for details on the result of this pre-analysis step.
 
 Review comment:
   super nit: could you retain the old formatting where the descriptions are offset to the right and aligned? I find it a bit easier to read

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376245
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -50,73 +51,126 @@
 /**
  * When there is a filter in a join query, we can sometimes improve performance by applying parts of the filter
  * when we first read from the base table instead of after the join.
- *
- * This class provides a {@link #splitFilter(HashJoinSegmentStorageAdapter, Set, Filter, boolean, boolean)} method that
- * takes a filter and splits it into a portion that should be applied to the base table prior to the join, and a
- * portion that should be applied after the join.
- *
+ * <p>
  * The first step of the filter splitting is to convert the filter into
  * https://en.wikipedia.org/wiki/Conjunctive_normal_form (an AND of ORs). This allows us to consider each
  * OR clause independently as a candidate for filter push down to the base table.
- *
+ * <p>
  * A filter clause can be pushed down if it meets one of the following conditions:
  * - The filter only applies to columns from the base table
  * - The filter applies to columns from the join table, and we determine that the filter can be rewritten
  *   into a filter on columns from the base table
- *
+ * <p>
  * For the second case, where we rewrite filter clauses, the rewritten clause can be less selective than the original,
  * so we preserve the original clause in the post-join filtering phase.
+ * <p>
+ * The starting point for join analysis is the {@link #computeJoinFilterPreAnalysis} method. This method should be
+ * called before performing any per-segment join query work. This method converts the query filter into
+ * conjunctive normal form, and splits the CNF clauses into a portion that only references base table columns and
+ * a portion that references join table columns. For the filter clauses that apply to join table columns, the
+ * pre-analysis step computes the information necessary for rewriting such filters into filters on base table columns.
+ * <p>
+ * The result of this pre-analysis method should be passed into the next step of join filter analysis, described below.
+ * <p>
+ * The {@link #splitFilter(JoinFilterPreAnalysis)} method takes the pre-analysis result and optionally applies the\
+ * filter rewrite and push down operations on a per-segment level.
  */
 public class JoinFilterAnalyzer
 {
   private static final String PUSH_DOWN_VIRTUAL_COLUMN_NAME_BASE = "JOIN-FILTER-PUSHDOWN-VIRTUAL-COLUMN-";
   private static final ColumnSelectorFactory ALL_NULL_COLUMN_SELECTOR_FACTORY = new AllNullColumnSelectorFactory();
 
   /**
-   * Analyze a filter and return a JoinFilterSplit indicating what parts of the filter should be applied pre-join
-   * and post-join.
+   * Before making per-segment filter splitting decisions, we first do a pre-analysis step
+   * where we convert the query filter (if any) into conjunctive normal form and then
+   * determine the structure of RHS filter rewrites (if any), since this information is shared across all
+   * per-segment operations.
    *
-   * @param hashJoinSegmentStorageAdapter The storage adapter that is being queried
-   * @param baseColumnNames               Set of names of columns that belong to the base table,
-   *                                      including pre-join virtual columns
-   * @param originalFilter                Original filter from the query
-   * @param enableFilterPushDown          Whether to enable filter push down
-   * @return A JoinFilterSplit indicating what parts of the filter should be applied pre-join
-   *         and post-join.
+   * See {@link JoinFilterPreAnalysis} for details on the result of this pre-analysis step.
 
 Review comment:
   Adjusted the alignment

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393260622
 
 

 ##########
 File path: processing/src/test/java/org/apache/druid/segment/join/JoinFilterAnalyzerTest.java
 ##########
 @@ -245,6 +266,27 @@ public void test_filterPushDown_factToRegionToCountryLeftFilterOnNullColumns()
         )
     );
 
+    List<JoinableClause> joinableClauses = ImmutableList.of(
+        factToRegion(JoinType.LEFT),
+        regionToCountry(JoinType.LEFT)
+    );
+
+    JoinFilterPreAnalysis joinFilterPreAnalysis = JoinFilterAnalyzer.computeJoinFilterPreAnalysis(
 
 Review comment:
   could you make a utility method that makes a `JoinFilterPreAnalysis` from a `joinableClauses` and `originalFilter`? seems like a lot of these blocks and all the other arguments are the same. Alternatively a builder for `JoinFilterPreAnalysis` would work, though idk if worth the effort since it would be mostly used by tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376432
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -462,45 +559,14 @@ private static String getCorrelatedBaseExprVirtualColumnName(int counter)
     return clauseForFilteredTable.getJoinable().getCorrelatedColumnValues(
         filterColumnNoPrefix,
         filterValue,
-        correlatedColumnNoPrefix
+        correlatedColumnNoPrefix,
+        filterRewriteMaxSize,
+        enableRewriteValueColumnFilters
     );
   }
 
-  /**
-   * For each rhs column that appears in the equiconditions for a table's JoinableClause,
-   * we try to determine what base table columns are related to the rhs column through the total set of equiconditions.
-   * We do this by searching backwards through the chain of join equiconditions using the provided equicondition map.
-   *
-   * For example, suppose we have 3 tables, A,B,C, joined with the following conditions, where A is the base table:
-   *   A.joinColumn == B.joinColumn
-   *   B.joinColum == C.joinColumn
-   *
-   * We would determine that C.joinColumn is correlated with A.joinColumn: we first see that
-   * C.joinColumn is linked to B.joinColumn which in turn is linked to A.joinColumn
-   *
-   * Suppose we had the following join conditions instead:
-   *   f(A.joinColumn) == B.joinColumn
-   *   B.joinColum == C.joinColumn
-   * In this case, the JoinFilterColumnCorrelationAnalysis for C.joinColumn would be linked to f(A.joinColumn).
-   *
-   * Suppose we had the following join conditions instead:
-   *   A.joinColumn == B.joinColumn
-   *   f(B.joinColum) == C.joinColumn
-   *
-   * Because we cannot reverse the function f() applied to the second table B in all cases,
-   * we cannot relate C.joinColumn to A.joinColumn, and we would not generate a correlation for C.joinColumn
-   *
-   * @param baseColumnNames      Set of names of columns that belong to the base table, including pre-join virtual
-   *                             columns
-   * @param tablePrefix          Prefix for a join table
-   * @param clauseForTablePrefix Joinable clause for the prefix
-   * @param equiConditions       Map of equiconditions, keyed by the right hand columns
-   *
-   * @return A list of correlatation analyses for the equicondition RHS columns that reside in the table associated with
-   * the tablePrefix
-   */
-  private static Optional<List<JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns(
-      Set<String> baseColumnNames,
+  private static Optional<Map<String, JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns(
 
 Review comment:
   Whoops, restored the missing javadocs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393263826
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -50,73 +51,126 @@
 /**
  * When there is a filter in a join query, we can sometimes improve performance by applying parts of the filter
  * when we first read from the base table instead of after the join.
- *
- * This class provides a {@link #splitFilter(HashJoinSegmentStorageAdapter, Set, Filter, boolean, boolean)} method that
- * takes a filter and splits it into a portion that should be applied to the base table prior to the join, and a
- * portion that should be applied after the join.
- *
+ * <p>
 
 Review comment:
   did you mean to put these `<p>` tags?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] codecov-io commented on issue #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#issuecomment-599861337
 
 
   # [Codecov](https://codecov.io/gh/apache/druid/pull/9516?src=pr&el=h1) Report
   > Merging [#9516](https://codecov.io/gh/apache/druid/pull/9516?src=pr&el=desc) into [master](https://codecov.io/gh/apache/druid/commit/e7b3dd9cd1c8a95855ee672e2fcbfe1b895ccc37?src=pr&el=desc) will **increase** coverage by `1.01%`.
   > The diff coverage is `88.31%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/druid/pull/9516/graphs/tree.svg?width=650&token=hn9gC9NXg0&height=150&src=pr)](https://codecov.io/gh/apache/druid/pull/9516?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #9516      +/-   ##
   ============================================
   + Coverage     63.59%    64.6%   +1.01%     
   - Complexity    23583    23639      +56     
   ============================================
     Files          3056     2939     -117     
     Lines        126205   120119    -6086     
     Branches      17456    16099    -1357     
   ============================================
   - Hits          80254    77608    -2646     
   + Misses        38756    35528    -3228     
   + Partials       7195     6983     -212
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/druid/pull/9516?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...a/org/apache/druid/query/groupby/GroupByQuery.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvcXVlcnkvZ3JvdXBieS9Hcm91cEJ5UXVlcnkuamF2YQ==) | `92.16% <ø> (ø)` | `128 <0> (ø)` | :arrow_down: |
   | [...in/java/org/apache/druid/query/topn/TopNQuery.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvcXVlcnkvdG9wbi9Ub3BOUXVlcnkuamF2YQ==) | `49.23% <ø> (ø)` | `18 <0> (ø)` | :arrow_down: |
   | [...apache/druid/query/timeseries/TimeseriesQuery.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvcXVlcnkvdGltZXNlcmllcy9UaW1lc2VyaWVzUXVlcnkuamF2YQ==) | `47.05% <ø> (ø)` | `18 <0> (ø)` | :arrow_down: |
   | [...in/java/org/apache/druid/query/scan/ScanQuery.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvcXVlcnkvc2Nhbi9TY2FuUXVlcnkuamF2YQ==) | `56.97% <ø> (ø)` | `26 <0> (ø)` | :arrow_down: |
   | [...g/apache/druid/server/LocalQuerySegmentWalker.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9kcnVpZC9zZXJ2ZXIvTG9jYWxRdWVyeVNlZ21lbnRXYWxrZXIuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | [...ng/src/main/java/org/apache/druid/query/Query.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvcXVlcnkvUXVlcnkuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | [...ache/druid/segment/join/lookup/LookupJoinable.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvc2VnbWVudC9qb2luL2xvb2t1cC9Mb29rdXBKb2luYWJsZS5qYXZh) | `47.61% <0%> (+16.04%)` | `6 <0> (+2)` | :arrow_up: |
   | [...druid/segment/join/table/IndexedTableJoinable.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvc2VnbWVudC9qb2luL3RhYmxlL0luZGV4ZWRUYWJsZUpvaW5hYmxlLmphdmE=) | `79.41% <0%> (+11.55%)` | `12 <0> (+2)` | :arrow_up: |
   | [...ain/java/org/apache/druid/query/QueryContexts.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvcXVlcnkvUXVlcnlDb250ZXh0cy5qYXZh) | `56.09% <0%> (-1.41%)` | `24 <0> (ø)` | |
   | [.../java/org/apache/druid/segment/join/Joinables.java](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree#diff-cHJvY2Vzc2luZy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZHJ1aWQvc2VnbWVudC9qb2luL0pvaW5hYmxlcy5qYXZh) | `97.61% <100%> (+0.05%)` | `23 <0> (ø)` | :arrow_down: |
   | ... and [146 more](https://codecov.io/gh/apache/druid/pull/9516/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/druid/pull/9516?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/druid/pull/9516?src=pr&el=footer). Last update [e7b3dd9...3a449c7](https://codecov.io/gh/apache/druid/pull/9516?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376174
 
 

 ##########
 File path: processing/src/test/java/org/apache/druid/segment/join/JoinFilterAnalyzerTest.java
 ##########
 @@ -245,6 +266,27 @@ public void test_filterPushDown_factToRegionToCountryLeftFilterOnNullColumns()
         )
     );
 
+    List<JoinableClause> joinableClauses = ImmutableList.of(
+        factToRegion(JoinType.LEFT),
+        regionToCountry(JoinType.LEFT)
+    );
+
+    JoinFilterPreAnalysis joinFilterPreAnalysis = JoinFilterAnalyzer.computeJoinFilterPreAnalysis(
 
 Review comment:
   Made a utility method

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393286785
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -311,124 +431,99 @@ private static JoinFilterAnalysis rewriteOrFilter(
     );
   }
 
-  /**
-   * Rewrites a selector filter on a join table into an IN filter on the base table.
-   *
-   * @param baseColumnNames  Set of names of columns that belong to the base table, including pre-join virtual
-   *                         columns
-   * @param selectorFilter   SelectorFilter to be rewritten
-   * @param prefixes         Map of join table prefixes to clauses
-   * @param equiconditions   Map of equiconditions
-   * @param correlationCache Cache of column correlation analyses. This will be potentially modified by adding
-   *                         any new column correlation analyses to the cache.
-   *
-   * @return A JoinFilterAnalysis that indicates how to handle the potentially rewritten filter
-   */
   private static JoinFilterAnalysis rewriteSelectorFilter(
-      Set<String> baseColumnNames,
       SelectorFilter selectorFilter,
-      Map<String, JoinableClause> prefixes,
-      Map<String, Set<Expr>> equiconditions,
-      Map<String, Optional<List<JoinFilterColumnCorrelationAnalysis>>> correlationCache
+      JoinFilterPreAnalysis joinFilterPreAnalysis
   )
   {
+
+    List<Filter> newFilters = new ArrayList<>();
+    List<VirtualColumn> pushdownVirtualColumns = new ArrayList<>();
+
     String filteringColumn = selectorFilter.getDimension();
-    for (Map.Entry<String, JoinableClause> prefixAndClause : prefixes.entrySet()) {
-      if (prefixAndClause.getValue().includesColumn(filteringColumn)) {
-        Optional<List<JoinFilterColumnCorrelationAnalysis>> correlations = correlationCache.computeIfAbsent(
-            prefixAndClause.getKey(),
-            p -> findCorrelatedBaseTableColumns(
-                baseColumnNames,
-                p,
-                prefixes.get(p),
-                equiconditions
-            )
-        );
+    String filteringValue = selectorFilter.getValue();
 
-        if (!correlations.isPresent()) {
-          return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
-        }
+    if (areSomeColumnsFromPostJoinVirtualColumns(
+        joinFilterPreAnalysis.getPostJoinVirtualColumns(),
+        selectorFilter.getRequiredColumns()
+    )) {
+      return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
+    }
 
-        List<Filter> newFilters = new ArrayList<>();
-        List<VirtualColumn> pushdownVirtualColumns = new ArrayList<>();
+    if (!areSomeColumnsFromJoin(joinFilterPreAnalysis.getJoinableClauses(), selectorFilter.getRequiredColumns())) {
+      return new JoinFilterAnalysis(
+          true,
+          selectorFilter,
+          selectorFilter,
+          pushdownVirtualColumns
+      );
+    }
 
-        for (JoinFilterColumnCorrelationAnalysis correlationAnalysis : correlations.get()) {
-          if (correlationAnalysis.supportsPushDown()) {
-            Set<String> correlatedValues = getCorrelatedValuesForPushDown(
-                selectorFilter.getDimension(),
-                selectorFilter.getValue(),
-                correlationAnalysis.getJoinColumn(),
-                prefixAndClause.getValue()
-            );
+    Optional<List<JoinFilterColumnCorrelationAnalysis>> correlationAnalyses = joinFilterPreAnalysis.getCorrelationsByFilteringColumn()
+                                                                                                   .get(filteringColumn);
 
-            if (correlatedValues.isEmpty()) {
-              return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
-            }
+    if (!correlationAnalyses.isPresent()) {
+      return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
+    }
 
-            for (String correlatedBaseColumn : correlationAnalysis.getBaseColumns()) {
-              Filter rewrittenFilter = new InDimFilter(
-                  correlatedBaseColumn,
-                  correlatedValues,
-                  null,
-                  null
-              ).toFilter();
-              newFilters.add(rewrittenFilter);
-            }
 
-            for (Expr correlatedBaseExpr : correlationAnalysis.getBaseExpressions()) {
-              // We need to create a virtual column for the expressions when pushing down.
-              // Note that this block is never entered right now, since correlationAnalysis.supportsPushDown()
-              // will return false if there any correlated expressions on the base table.
-              // Pushdown of such filters is disabled until the expressions system supports converting an expression
-              // into a String representation that can be reparsed into the same expression.
-              // https://github.com/apache/druid/issues/9326 tracks this expressions issue.
-              String vcName = getCorrelatedBaseExprVirtualColumnName(pushdownVirtualColumns.size());
-
-              VirtualColumn correlatedBaseExprVirtualColumn = new ExpressionVirtualColumn(
-                  vcName,
-                  correlatedBaseExpr,
-                  ValueType.STRING
-              );
-              pushdownVirtualColumns.add(correlatedBaseExprVirtualColumn);
-
-              Filter rewrittenFilter = new InDimFilter(
-                  vcName,
-                  correlatedValues,
-                  null,
-                  null
-              ).toFilter();
-              newFilters.add(rewrittenFilter);
-            }
-          }
-        }
+    for (JoinFilterColumnCorrelationAnalysis correlationAnalysis : correlationAnalyses.get()) {
+      if (correlationAnalysis.supportsPushDown()) {
+        Optional<Set<String>> correlatedValues = correlationAnalysis.getCorrelatedValuesMap().get(
+            Pair.of(filteringColumn, filteringValue)
+        );
 
-        if (newFilters.isEmpty()) {
+        if (!correlatedValues.isPresent()) {
           return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
         }
 
-        return new JoinFilterAnalysis(
-            true,
-            selectorFilter,
-            Filters.and(newFilters),
-            pushdownVirtualColumns
-        );
+        for (String correlatedBaseColumn : correlationAnalysis.getBaseColumns()) {
+          Filter rewrittenFilter = new InDimFilter(
+              correlatedBaseColumn,
+              correlatedValues.get(),
+              null,
+              null
+          ).toFilter();
+          newFilters.add(rewrittenFilter);
+        }
+
+        for (Expr correlatedBaseExpr : correlationAnalysis.getBaseExpressions()) {
+          // We need to create a virtual column for the expressions when pushing down.
+          // Note that this block is never entered right now, since correlationAnalysis.supportsPushDown()
+          // will return false if there any correlated expressions on the base table.
+          // Pushdown of such filters is disabled until the expressions system supports converting an expression
+          // into a String representation that can be reparsed into the same expression.
+          // https://github.com/apache/druid/issues/9326 tracks this expressions issue.
 
 Review comment:
   hmm, this is already done in #9367, should you make the modification to make this comment no longer true?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393287117
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -311,124 +431,99 @@ private static JoinFilterAnalysis rewriteOrFilter(
     );
   }
 
-  /**
-   * Rewrites a selector filter on a join table into an IN filter on the base table.
-   *
-   * @param baseColumnNames  Set of names of columns that belong to the base table, including pre-join virtual
-   *                         columns
-   * @param selectorFilter   SelectorFilter to be rewritten
-   * @param prefixes         Map of join table prefixes to clauses
-   * @param equiconditions   Map of equiconditions
-   * @param correlationCache Cache of column correlation analyses. This will be potentially modified by adding
-   *                         any new column correlation analyses to the cache.
-   *
-   * @return A JoinFilterAnalysis that indicates how to handle the potentially rewritten filter
-   */
   private static JoinFilterAnalysis rewriteSelectorFilter(
 
 Review comment:
   same question about removal of javadocs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376106
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinable.java
 ##########
 @@ -89,6 +89,8 @@ JoinMatcher makeJoinMatcher(
   Set<String> getCorrelatedColumnValues(
       String searchColumnName,
       String searchColumnValue,
-      String retrievalColumnName
+      String retrievalColumnName,
+      long maxCorrelationSetSize,
 
 Review comment:
   Updated javadocs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393377235
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -311,124 +431,99 @@ private static JoinFilterAnalysis rewriteOrFilter(
     );
   }
 
-  /**
-   * Rewrites a selector filter on a join table into an IN filter on the base table.
-   *
-   * @param baseColumnNames  Set of names of columns that belong to the base table, including pre-join virtual
-   *                         columns
-   * @param selectorFilter   SelectorFilter to be rewritten
-   * @param prefixes         Map of join table prefixes to clauses
-   * @param equiconditions   Map of equiconditions
-   * @param correlationCache Cache of column correlation analyses. This will be potentially modified by adding
-   *                         any new column correlation analyses to the cache.
-   *
-   * @return A JoinFilterAnalysis that indicates how to handle the potentially rewritten filter
-   */
   private static JoinFilterAnalysis rewriteSelectorFilter(
 
 Review comment:
   Restored missing javadocs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#issuecomment-599837167
 
 
   This pull request **introduces 1 alert** when merging e93d282389e30a608ea10a6ccd071e4876219ca9 into 7626be26caccdb5fe90d4e4c48e9138abcf975e7 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-2b121783a374af8413e96b63f210baf3e076d590)
   
   **new alerts:**
   
   * 1 for Spurious Javadoc @param tags

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
ccaominh commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393402217
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -237,22 +218,16 @@ public Metadata getMetadata()
       @Nullable final QueryMetrics<?> queryMetrics
   )
   {
-
     final List<VirtualColumn> preJoinVirtualColumns = new ArrayList<>();
     final List<VirtualColumn> postJoinVirtualColumns = new ArrayList<>();
+
     final Set<String> baseColumns = determineBaseColumnsWithPreAndPostJoinVirtualColumns(
 
 Review comment:
   With the deletion below, CI (spotbugs and intellij inspections) is flagging this as unused now

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on issue #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on issue #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#issuecomment-599692011
 
 
   Rebased and fixed conflicts

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376315
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -602,20 +660,21 @@ private static void getCorrelationForRHSColumn(
   /**
    * Given a list of JoinFilterColumnCorrelationAnalysis, prune the list so that we only have one
    * JoinFilterColumnCorrelationAnalysis for each unique combination of base columns.
-   *
+   * <p>
 
 Review comment:
   Removed the `<p>` tags

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376150
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -83,17 +87,31 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       final JoinableFactory joinableFactory,
       final AtomicLong cpuTimeAccumulator,
       final boolean enableFilterPushDown,
-      final boolean enableFilterRewrite
+      final boolean enableFilterRewrite,
+      final boolean enableRewriteValueColumnFilters,
 
 Review comment:
   Updated javadocs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393241099
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinable.java
 ##########
 @@ -89,6 +89,8 @@ JoinMatcher makeJoinMatcher(
   Set<String> getCorrelatedColumnValues(
       String searchColumnName,
       String searchColumnValue,
-      String retrievalColumnName
+      String retrievalColumnName,
+      long maxCorrelationSetSize,
 
 Review comment:
   please update javadoc with new parameters

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393284473
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -635,4 +694,106 @@ private static boolean filterMatchesNull(Filter filter)
     ValueMatcher valueMatcher = filter.makeMatcher(ALL_NULL_COLUMN_SELECTOR_FACTORY);
     return valueMatcher.matches();
   }
+
+  private static JoinableClause isColumnFromJoin(
+      List<JoinableClause> joinableClauses,
+      String column
+  )
+  {
+    for (JoinableClause joinableClause : joinableClauses) {
+      if (joinableClause.includesColumn(column)) {
+        return joinableClause;
+      }
+    }
+
+    return null;
+  }
+
+  private static boolean isColumnFromPostJoinVirtualColumns(
+      List<VirtualColumn> postJoinVirtualColumns,
+      String column
+  )
+  {
+    for (VirtualColumn postJoinVirtualColumn : postJoinVirtualColumns) {
+      if (column.equals(postJoinVirtualColumn.getOutputName())) {
+        return true;
+      }
+    }
+    return false;
+  }
+
+  private static boolean areSomeColumnsFromJoin(
+      List<JoinableClause> joinableClauses,
+      Collection<String> columns
+  )
+  {
+    for (String column : columns) {
+      if (isColumnFromJoin(joinableClauses, column) != null) {
+        return true;
+      }
+    }
+    return false;
+  }
+
+  private static boolean areSomeColumnsFromPostJoinVirtualColumns(
+      List<VirtualColumn> postJoinVirtualColumns,
+      Collection<String> columns
+  )
+  {
+    for (String column : columns) {
+      if (isColumnFromPostJoinVirtualColumns(postJoinVirtualColumns, column)) {
+        return true;
+      }
+    }
+    return false;
+  }
+
+  private static void splitVirtualColumns(
+      List<JoinableClause> joinableClauses,
+      final VirtualColumns virtualColumns,
+      final List<VirtualColumn> preJoinVirtualColumns,
+      final List<VirtualColumn> postJoinVirtualColumns
+  )
+  {
+    for (VirtualColumn virtualColumn : virtualColumns.getVirtualColumns()) {
+      if (areSomeColumnsFromJoin(joinableClauses, virtualColumn.requiredColumns())) {
+        postJoinVirtualColumns.add(virtualColumn);
+      } else {
+        preJoinVirtualColumns.add(virtualColumn);
+      }
+    }
+  }
+
+  private static class RHSRewriteCandidate
 
 Review comment:
   nit: I think this should probably be named `RhsRewriteCandidate`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393285805
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -540,25 +607,15 @@ private static String getCorrelatedBaseExprVirtualColumnName(int counter)
       );
     }
 
-    List<JoinFilterColumnCorrelationAnalysis> dedupCorrelations = eliminateCorrelationDuplicates(correlations);
-
-    return Optional.of(dedupCorrelations);
+    if (correlations.size() == 0) {
+      return Optional.empty();
+    } else {
+      return Optional.of(correlations);
+    }
   }
 
-  /**
-   * Helper method for {@link #findCorrelatedBaseTableColumns} that determines correlated base table columns
-   * and/or expressions for a single RHS column and adds them to the provided sets as it traverses the
-   * equicondition column relationships.
-   *
-   * @param baseColumnNames  Set of names of columns that belong to the base table, including pre-join virtual columns
-   * @param equiConditions Map of equiconditions, keyed by the right hand columns
-   * @param rhsColumn RHS column to find base table correlations for
-   * @param correlatedBaseColumns Set of correlated base column names for the provided RHS column. Will be modified.
-   * @param correlatedBaseExpressions Set of correlated base column expressions for the provided RHS column. Will be
-   *                                  modified.
-   */
   private static void getCorrelationForRHSColumn(
-      Set<String> baseColumnNames,
+      List<JoinableClause> joinableClauses,
 
 Review comment:
   same question about removing javadocs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376483
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -540,25 +607,15 @@ private static String getCorrelatedBaseExprVirtualColumnName(int counter)
       );
     }
 
-    List<JoinFilterColumnCorrelationAnalysis> dedupCorrelations = eliminateCorrelationDuplicates(correlations);
-
-    return Optional.of(dedupCorrelations);
+    if (correlations.size() == 0) {
+      return Optional.empty();
+    } else {
+      return Optional.of(correlations);
+    }
   }
 
-  /**
-   * Helper method for {@link #findCorrelatedBaseTableColumns} that determines correlated base table columns
-   * and/or expressions for a single RHS column and adds them to the provided sets as it traverses the
-   * equicondition column relationships.
-   *
-   * @param baseColumnNames  Set of names of columns that belong to the base table, including pre-join virtual columns
-   * @param equiConditions Map of equiconditions, keyed by the right hand columns
-   * @param rhsColumn RHS column to find base table correlations for
-   * @param correlatedBaseColumns Set of correlated base column names for the provided RHS column. Will be modified.
-   * @param correlatedBaseExpressions Set of correlated base column expressions for the provided RHS column. Will be
-   *                                  modified.
-   */
   private static void getCorrelationForRHSColumn(
-      Set<String> baseColumnNames,
+      List<JoinableClause> joinableClauses,
 
 Review comment:
   Fixed, restored the docs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376213
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -50,73 +51,126 @@
 /**
  * When there is a filter in a join query, we can sometimes improve performance by applying parts of the filter
  * when we first read from the base table instead of after the join.
- *
- * This class provides a {@link #splitFilter(HashJoinSegmentStorageAdapter, Set, Filter, boolean, boolean)} method that
- * takes a filter and splits it into a portion that should be applied to the base table prior to the join, and a
- * portion that should be applied after the join.
- *
+ * <p>
 
 Review comment:
   I removed the <p> tags

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393377149
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -311,124 +431,99 @@ private static JoinFilterAnalysis rewriteOrFilter(
     );
   }
 
-  /**
-   * Rewrites a selector filter on a join table into an IN filter on the base table.
-   *
-   * @param baseColumnNames  Set of names of columns that belong to the base table, including pre-join virtual
-   *                         columns
-   * @param selectorFilter   SelectorFilter to be rewritten
-   * @param prefixes         Map of join table prefixes to clauses
-   * @param equiconditions   Map of equiconditions
-   * @param correlationCache Cache of column correlation analyses. This will be potentially modified by adding
-   *                         any new column correlation analyses to the cache.
-   *
-   * @return A JoinFilterAnalysis that indicates how to handle the potentially rewritten filter
-   */
   private static JoinFilterAnalysis rewriteSelectorFilter(
-      Set<String> baseColumnNames,
       SelectorFilter selectorFilter,
-      Map<String, JoinableClause> prefixes,
-      Map<String, Set<Expr>> equiconditions,
-      Map<String, Optional<List<JoinFilterColumnCorrelationAnalysis>>> correlationCache
+      JoinFilterPreAnalysis joinFilterPreAnalysis
   )
   {
+
+    List<Filter> newFilters = new ArrayList<>();
+    List<VirtualColumn> pushdownVirtualColumns = new ArrayList<>();
+
     String filteringColumn = selectorFilter.getDimension();
-    for (Map.Entry<String, JoinableClause> prefixAndClause : prefixes.entrySet()) {
-      if (prefixAndClause.getValue().includesColumn(filteringColumn)) {
-        Optional<List<JoinFilterColumnCorrelationAnalysis>> correlations = correlationCache.computeIfAbsent(
-            prefixAndClause.getKey(),
-            p -> findCorrelatedBaseTableColumns(
-                baseColumnNames,
-                p,
-                prefixes.get(p),
-                equiconditions
-            )
-        );
+    String filteringValue = selectorFilter.getValue();
 
-        if (!correlations.isPresent()) {
-          return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
-        }
+    if (areSomeColumnsFromPostJoinVirtualColumns(
+        joinFilterPreAnalysis.getPostJoinVirtualColumns(),
+        selectorFilter.getRequiredColumns()
+    )) {
+      return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
+    }
 
-        List<Filter> newFilters = new ArrayList<>();
-        List<VirtualColumn> pushdownVirtualColumns = new ArrayList<>();
+    if (!areSomeColumnsFromJoin(joinFilterPreAnalysis.getJoinableClauses(), selectorFilter.getRequiredColumns())) {
+      return new JoinFilterAnalysis(
+          true,
+          selectorFilter,
+          selectorFilter,
+          pushdownVirtualColumns
+      );
+    }
 
-        for (JoinFilterColumnCorrelationAnalysis correlationAnalysis : correlations.get()) {
-          if (correlationAnalysis.supportsPushDown()) {
-            Set<String> correlatedValues = getCorrelatedValuesForPushDown(
-                selectorFilter.getDimension(),
-                selectorFilter.getValue(),
-                correlationAnalysis.getJoinColumn(),
-                prefixAndClause.getValue()
-            );
+    Optional<List<JoinFilterColumnCorrelationAnalysis>> correlationAnalyses = joinFilterPreAnalysis.getCorrelationsByFilteringColumn()
+                                                                                                   .get(filteringColumn);
 
-            if (correlatedValues.isEmpty()) {
-              return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
-            }
+    if (!correlationAnalyses.isPresent()) {
+      return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
+    }
 
-            for (String correlatedBaseColumn : correlationAnalysis.getBaseColumns()) {
-              Filter rewrittenFilter = new InDimFilter(
-                  correlatedBaseColumn,
-                  correlatedValues,
-                  null,
-                  null
-              ).toFilter();
-              newFilters.add(rewrittenFilter);
-            }
 
-            for (Expr correlatedBaseExpr : correlationAnalysis.getBaseExpressions()) {
-              // We need to create a virtual column for the expressions when pushing down.
-              // Note that this block is never entered right now, since correlationAnalysis.supportsPushDown()
-              // will return false if there any correlated expressions on the base table.
-              // Pushdown of such filters is disabled until the expressions system supports converting an expression
-              // into a String representation that can be reparsed into the same expression.
-              // https://github.com/apache/druid/issues/9326 tracks this expressions issue.
-              String vcName = getCorrelatedBaseExprVirtualColumnName(pushdownVirtualColumns.size());
-
-              VirtualColumn correlatedBaseExprVirtualColumn = new ExpressionVirtualColumn(
-                  vcName,
-                  correlatedBaseExpr,
-                  ValueType.STRING
-              );
-              pushdownVirtualColumns.add(correlatedBaseExprVirtualColumn);
-
-              Filter rewrittenFilter = new InDimFilter(
-                  vcName,
-                  correlatedValues,
-                  null,
-                  null
-              ).toFilter();
-              newFilters.add(rewrittenFilter);
-            }
-          }
-        }
+    for (JoinFilterColumnCorrelationAnalysis correlationAnalysis : correlationAnalyses.get()) {
+      if (correlationAnalysis.supportsPushDown()) {
+        Optional<Set<String>> correlatedValues = correlationAnalysis.getCorrelatedValuesMap().get(
+            Pair.of(filteringColumn, filteringValue)
+        );
 
-        if (newFilters.isEmpty()) {
+        if (!correlatedValues.isPresent()) {
           return JoinFilterAnalysis.createNoPushdownFilterAnalysis(selectorFilter);
         }
 
-        return new JoinFilterAnalysis(
-            true,
-            selectorFilter,
-            Filters.and(newFilters),
-            pushdownVirtualColumns
-        );
+        for (String correlatedBaseColumn : correlationAnalysis.getBaseColumns()) {
+          Filter rewrittenFilter = new InDimFilter(
+              correlatedBaseColumn,
+              correlatedValues.get(),
+              null,
+              null
+          ).toFilter();
+          newFilters.add(rewrittenFilter);
+        }
+
+        for (Expr correlatedBaseExpr : correlationAnalysis.getBaseExpressions()) {
+          // We need to create a virtual column for the expressions when pushing down.
+          // Note that this block is never entered right now, since correlationAnalysis.supportsPushDown()
+          // will return false if there any correlated expressions on the base table.
+          // Pushdown of such filters is disabled until the expressions system supports converting an expression
+          // into a String representation that can be reparsed into the same expression.
+          // https://github.com/apache/druid/issues/9326 tracks this expressions issue.
 
 Review comment:
   I'll do that in a follow-on PR

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393376213
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -50,73 +51,126 @@
 /**
  * When there is a filter in a join query, we can sometimes improve performance by applying parts of the filter
  * when we first read from the base table instead of after the join.
- *
- * This class provides a {@link #splitFilter(HashJoinSegmentStorageAdapter, Set, Filter, boolean, boolean)} method that
- * takes a filter and splits it into a portion that should be applied to the base table prior to the join, and a
- * portion that should be applied after the join.
- *
+ * <p>
 
 Review comment:
   I removed the `<p>` tags

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393241271
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -83,17 +87,31 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       final JoinableFactory joinableFactory,
       final AtomicLong cpuTimeAccumulator,
       final boolean enableFilterPushDown,
-      final boolean enableFilterRewrite
+      final boolean enableFilterRewrite,
+      final boolean enableRewriteValueColumnFilters,
 
 Review comment:
   ditto javadoc

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393285640
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -462,45 +559,14 @@ private static String getCorrelatedBaseExprVirtualColumnName(int counter)
     return clauseForFilteredTable.getJoinable().getCorrelatedColumnValues(
         filterColumnNoPrefix,
         filterValue,
-        correlatedColumnNoPrefix
+        correlatedColumnNoPrefix,
+        filterRewriteMaxSize,
+        enableRewriteValueColumnFilters
     );
   }
 
-  /**
-   * For each rhs column that appears in the equiconditions for a table's JoinableClause,
-   * we try to determine what base table columns are related to the rhs column through the total set of equiconditions.
-   * We do this by searching backwards through the chain of join equiconditions using the provided equicondition map.
-   *
-   * For example, suppose we have 3 tables, A,B,C, joined with the following conditions, where A is the base table:
-   *   A.joinColumn == B.joinColumn
-   *   B.joinColum == C.joinColumn
-   *
-   * We would determine that C.joinColumn is correlated with A.joinColumn: we first see that
-   * C.joinColumn is linked to B.joinColumn which in turn is linked to A.joinColumn
-   *
-   * Suppose we had the following join conditions instead:
-   *   f(A.joinColumn) == B.joinColumn
-   *   B.joinColum == C.joinColumn
-   * In this case, the JoinFilterColumnCorrelationAnalysis for C.joinColumn would be linked to f(A.joinColumn).
-   *
-   * Suppose we had the following join conditions instead:
-   *   A.joinColumn == B.joinColumn
-   *   f(B.joinColum) == C.joinColumn
-   *
-   * Because we cannot reverse the function f() applied to the second table B in all cases,
-   * we cannot relate C.joinColumn to A.joinColumn, and we would not generate a correlation for C.joinColumn
-   *
-   * @param baseColumnNames      Set of names of columns that belong to the base table, including pre-join virtual
-   *                             columns
-   * @param tablePrefix          Prefix for a join table
-   * @param clauseForTablePrefix Joinable clause for the prefix
-   * @param equiConditions       Map of equiconditions, keyed by the right hand columns
-   *
-   * @return A list of correlatation analyses for the equicondition RHS columns that reside in the table associated with
-   * the tablePrefix
-   */
-  private static Optional<List<JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns(
-      Set<String> baseColumnNames,
+  private static Optional<Map<String, JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns(
 
 Review comment:
   why remove javadocs?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
ccaominh commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393402568
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -128,31 +183,172 @@ public static JoinFilterSplit splitFilter(
       }
     }
 
-    // List of candidates for pushdown
-    // CNF normalization will generate either
-    // - an AND filter with multiple subfilters
-    // - or a single non-AND subfilter which cannot be split further
-    List<Filter> normalizedOrClauses;
-    if (normalizedFilter instanceof AndFilter) {
-      normalizedOrClauses = ((AndFilter) normalizedFilter).getFilters();
-    } else {
-      normalizedOrClauses = Collections.singletonList(normalizedFilter);
+    List<VirtualColumn> pushDownVirtualColumns = new ArrayList<>();
 
 Review comment:
   CI (spotbugs and intellij inspections) is flagging this as unused

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9516: More efficient join filter rewrites

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9516: More efficient join filter rewrites
URL: https://github.com/apache/druid/pull/9516#discussion_r393285072
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 ##########
 @@ -602,20 +660,21 @@ private static void getCorrelationForRHSColumn(
   /**
    * Given a list of JoinFilterColumnCorrelationAnalysis, prune the list so that we only have one
    * JoinFilterColumnCorrelationAnalysis for each unique combination of base columns.
-   *
+   * <p>
 
 Review comment:
   ditto question if `<p>` are intentional

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org