You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/25 19:57:00 UTC
[jira] [Work logged] (HIVE-24564) Extend PPD filter transitivity to be able to discover new opportunities

     [ https://issues.apache.org/jira/browse/HIVE-24564?focusedWorklogId=541273&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-541273 ]

ASF GitHub Bot logged work on HIVE-24564:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Jan/21 19:56
            Start Date: 25/Jan/21 19:56
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on a change in pull request #1811:
URL: https://github.com/apache/hive/pull/1811#discussion_r563973915



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -712,14 +714,47 @@ private void applyFilterTransitivity(JoinOperator join, int targetPos, OpWalkerI
           if (!sourceAliases.contains(entry.getKey())) {
             continue;
           }
+
+          Set<ExprNodeColumnDesc> columnsInPredicates = null;
+          if (HiveConf.getBoolVar(owi.getParseContext().getConf(),
+                  HiveConf.ConfVars.HIVEPPD_RECOGNIZE_COLUMN_EQUALITIES)) {
+            columnsInPredicates = owi.getColumnsInPredicates().get(source);
+            if (columnsInPredicates == null) {
+              columnsInPredicates = collectColumnsInPredicates(entry.getValue());
+              owi.getColumnsInPredicates().put(source, columnsInPredicates);
+            }
+          }
+
           for (ExprNodeDesc predicate : entry.getValue()) {
             ExprNodeDesc backtrack = ExprNodeDescUtils.backtrack(predicate, join, source);
             if (backtrack == null) {
               continue;
             }
             ExprNodeDesc replaced = ExprNodeDescUtils.replace(backtrack, sourceKeys, targetKeys);
             if (replaced == null) {
-              continue;
+              if (!HiveConf.getBoolVar(owi.getParseContext().getConf(),

Review comment:
       Extract the property value into a variable before entering in the loop (~L700).

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -712,14 +714,47 @@ private void applyFilterTransitivity(JoinOperator join, int targetPos, OpWalkerI
           if (!sourceAliases.contains(entry.getKey())) {
             continue;
           }
+
+          Set<ExprNodeColumnDesc> columnsInPredicates = null;
+          if (HiveConf.getBoolVar(owi.getParseContext().getConf(),

Review comment:
       Extract the property value into a variable before entering in the loop (~L700).

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpWalkerInfo.java
##########
@@ -39,11 +43,15 @@
     opToPushdownPredMap;
   private final ParseContext pGraphContext;
   private final List<FilterOperator> candidateFilterOps;
+  private final Map<Operator<?>, Set<ExprNodeColumnDesc>> columnsInPredicates;

Review comment:
       Can we add comments about these data structures and what they will hold?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -728,6 +763,166 @@ private void applyFilterTransitivity(JoinOperator join, int targetPos, OpWalkerI
         }
       }
     }
+
+    private Set<ExprNodeColumnDesc> collectColumnsInPredicates(List<ExprNodeDesc> predicates) {

Review comment:
       Can we add a few high level comments to these new private methods describing what they do?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -2461,6 +2461,10 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
         "Whether to enable predicate pushdown through windowing"),
     HIVEPPDRECOGNIZETRANSITIVITY("hive.ppd.recognizetransivity", true,
         "Whether to transitively replicate predicate filters over equijoin conditions."),
+    HIVEPPD_RECOGNIZE_COLUMN_EQUALITIES("hive.ppd.recognize.column.equalities", true,
+        "When hive.ppd.recognizetransivity is true Whether traverse join branches to discover equal columns based" +
+                " on equijoin keys and try to substitute equal columns to predicates " +
+                "and push down to the other branch."),
     HIVEPPDREMOVEDUPLICATEFILTERS("hive.ppd.remove.duplicatefilters", true,

Review comment:
       "Whether we should traverse the join branches to discover transitive propagation opportunities over equijoin conditions. \n" +
                   "Requires hive.ppd.recognizetransivity to be set to true."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 541273)
    Time Spent: 0.5h  (was: 20m)

> Extend PPD filter transitivity to be able to discover new opportunities
> -----------------------------------------------------------------------
>
>                 Key: HIVE-24564
>                 URL: https://issues.apache.org/jira/browse/HIVE-24564
>             Project: Hive
>          Issue Type: Improvement
>          Components: Logical Optimizer
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If a predicate references a value column of one of the parent ReduceSink operators of a Join the predicate can not be copied and pushed down to the other side of the join. However if we a parent equijoin exists in the branch of the RS where 
>  1. the referenced value column is a key column of that join
>  2. and the other side of that join expression is the key column of the RS
>  the column in the predicate can be replaced and the new predicate can be pushed down.
> {code:java}
>                                    Join(... = wr_on)
>                                   /                 \
>                                 ...                  RS(key: wr_on)
>                                                           |
>                                               Join(ws1.ws_on = ws2.ws_on)
>                                               (ws1.ws_on, ws2.ws_on, wr_on)
>                                               /                         \
>                                   RS(key:ws_on)                          RS(key:ws_on)
>                                     (value: wr_on)
>                                        |                                       |
>                            Join(ws1.ws_on = wr.wr_on)                       TS(ws2)
>                            /                        \
>                      RS(key:ws_on)              RS(key:wr_on)
>                            |                        |
>                         TS(ws1)                   TS(wr)
> {code}
> A predicate like
> {code}
> (wr_on in (...))
> {code}
> can not be pushed to TS(ws2) because wr_on is not a key column in Join(ws1.ws_on = ws2.ws_on). But we know that wr_on is equals to ws_on because the join from the left branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)