You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/07/27 07:16:56 UTC

[GitHub] [hive] letsflyinthesky opened a new pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

letsflyinthesky opened a new pull request #1322:
URL: https://github.com/apache/hive/pull/1322


   …ll be push down and nondeterministic filter which keeps current position
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-XXXXX: Fix a typo in YYY)
   For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-666864106


   ... you can give a view on "Understanding Hive Branches" on the above link


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] letsflyinthesky commented on a change in pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
letsflyinthesky commented on a change in pull request #1322:
URL: https://github.com/apache/hive/pull/1322#discussion_r461344354



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -417,9 +417,17 @@ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
         if (!ewi.isDeterministic()) {
           /* predicate is not deterministic */
           if (op.getChildren() != null && op.getChildren().size() == 1) {
+            ExprWalkerInfo prunedPreds = owi.getPrunedPreds((Operator<? extends OperatorDesc>) (op
+                .getChildren().get(0)));
+            //resolve of HIVE-23893
+            if (!(prunedPreds != null && prunedPreds.hasAnyCandidates())

Review comment:
       I have remove redundant condition and take into account child preds which will be merge into new deterministic filter preds




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #1322:
URL: https://github.com/apache/hive/pull/1322


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a change in pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on a change in pull request #1322:
URL: https://github.com/apache/hive/pull/1322#discussion_r460826125



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -782,6 +790,89 @@ protected ExprWalkerInfo mergeChildrenPred(Node nd, OpWalkerInfo owi,
     }
   }
 
+  protected static Object splitFilter(FilterOperator op,
+      ExprWalkerInfo ewi, OpWalkerInfo owi) throws SemanticException {
+
+    RowSchema inputRS = op.getSchema();
+
+    Map<String, List<ExprNodeDesc>> pushDownPreds = ewi.getFinalCandidates();
+    Map<String, List<ExprNodeDesc>> unPushDownPreds = ewi.getNonFinalCandidates();
+
+    // combine all deterministic predicates into a single expression
+    List<ExprNodeDesc> deterministicPreds = new ArrayList<ExprNodeDesc>();
+    Iterator<List<ExprNodeDesc>> iterator1 = pushDownPreds.values().iterator();
+    while (iterator1.hasNext()) {
+      for (ExprNodeDesc pred : iterator1.next()) {
+        deterministicPreds = ExprNodeDescUtils.split(pred, deterministicPreds);
+      }
+    }
+
+    if (deterministicPreds.isEmpty()) {
+      return null;
+    }
+
+    List<ExprNodeDesc> nondeterministicPreds = new ArrayList<ExprNodeDesc>();
+    Iterator<List<ExprNodeDesc>> iterator2 = unPushDownPreds.values().iterator();
+    while (iterator2.hasNext()) {
+      for (ExprNodeDesc pred : iterator2.next()) {
+        nondeterministicPreds = ExprNodeDescUtils.split(pred, nondeterministicPreds);
+      }
+    }
+
+    assert !nondeterministicPreds.isEmpty();
+
+    ExprNodeDesc deterministicCondn = ExprNodeDescUtils.mergePredicates(deterministicPreds);
+    ExprNodeDesc nondeterministicCondn = ExprNodeDescUtils.mergePredicates(nondeterministicPreds);
+
+    Operator<FilterDesc> deterministicFilter =
+        OperatorFactory.get(new FilterDesc(deterministicCondn, false), new RowSchema(inputRS.getSignature()));
+
+    deterministicFilter.setChildOperators(new ArrayList<Operator<? extends OperatorDesc>>());
+    deterministicFilter.getChildOperators().add(op);
+
+    List<Operator<? extends OperatorDesc>> originalParents = op
+        .getParentOperators();
+    for (Operator<? extends OperatorDesc> parent : originalParents) {
+      List<Operator<? extends OperatorDesc>> childOperators = parent
+          .getChildOperators();
+      int pos = childOperators.indexOf(op);
+      childOperators.remove(pos);
+      childOperators.add(pos, deterministicFilter);
+
+      int pPos = op.getParentOperators().indexOf(parent);
+      deterministicFilter.getParentOperators().add(pPos, parent);
+    }
+
+    op.getParentOperators().clear();
+    op.getParentOperators().add(deterministicFilter);
+    op.getConf().setPredicate(nondeterministicCondn);
+
+    if (HiveConf.getBoolVar(owi.getParseContext().getConf(),
+        HiveConf.ConfVars.HIVEPPDREMOVEDUPLICATEFILTERS)) {
+      // remove the candidate filter ops
+      for (FilterOperator fop : owi.getCandidateFilterOps()) {
+        List<Operator<? extends OperatorDesc>> children = fop.getChildOperators();
+        List<Operator<? extends OperatorDesc>> parents = fop.getParentOperators();
+        for (Operator<? extends OperatorDesc> parent : parents) {
+          parent.getChildOperators().addAll(children);
+          parent.removeChild(fop);
+        }
+        for (Operator<? extends OperatorDesc> child : children) {
+          child.getParentOperators().addAll(parents);
+          child.removeParent(fop);
+        }
+      }
+      owi.getCandidateFilterOps().clear();
+    }
+
+    ewi = ExprWalkerProcFactory.extractPushdownPreds(owi, op,
+        deterministicFilter.getConf().getPredicate());
+
+    owi.putPrunedPreds(deterministicFilter, ewi);

Review comment:
       the deterministicFilter should add to OpWalkerInfo.candidateFilterOps and no need to extractPushdownPreds again

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -782,6 +790,89 @@ protected ExprWalkerInfo mergeChildrenPred(Node nd, OpWalkerInfo owi,
     }
   }
 
+  protected static Object splitFilter(FilterOperator op,
+      ExprWalkerInfo ewi, OpWalkerInfo owi) throws SemanticException {
+
+    RowSchema inputRS = op.getSchema();
+
+    Map<String, List<ExprNodeDesc>> pushDownPreds = ewi.getFinalCandidates();
+    Map<String, List<ExprNodeDesc>> unPushDownPreds = ewi.getNonFinalCandidates();
+
+    // combine all deterministic predicates into a single expression
+    List<ExprNodeDesc> deterministicPreds = new ArrayList<ExprNodeDesc>();
+    Iterator<List<ExprNodeDesc>> iterator1 = pushDownPreds.values().iterator();
+    while (iterator1.hasNext()) {
+      for (ExprNodeDesc pred : iterator1.next()) {
+        deterministicPreds = ExprNodeDescUtils.split(pred, deterministicPreds);
+      }
+    }
+
+    if (deterministicPreds.isEmpty()) {
+      return null;
+    }
+
+    List<ExprNodeDesc> nondeterministicPreds = new ArrayList<ExprNodeDesc>();
+    Iterator<List<ExprNodeDesc>> iterator2 = unPushDownPreds.values().iterator();
+    while (iterator2.hasNext()) {
+      for (ExprNodeDesc pred : iterator2.next()) {
+        nondeterministicPreds = ExprNodeDescUtils.split(pred, nondeterministicPreds);
+      }
+    }
+
+    assert !nondeterministicPreds.isEmpty();
+
+    ExprNodeDesc deterministicCondn = ExprNodeDescUtils.mergePredicates(deterministicPreds);
+    ExprNodeDesc nondeterministicCondn = ExprNodeDescUtils.mergePredicates(nondeterministicPreds);
+
+    Operator<FilterDesc> deterministicFilter =
+        OperatorFactory.get(new FilterDesc(deterministicCondn, false), new RowSchema(inputRS.getSignature()));
+
+    deterministicFilter.setChildOperators(new ArrayList<Operator<? extends OperatorDesc>>());
+    deterministicFilter.getChildOperators().add(op);
+
+    List<Operator<? extends OperatorDesc>> originalParents = op
+        .getParentOperators();
+    for (Operator<? extends OperatorDesc> parent : originalParents) {
+      List<Operator<? extends OperatorDesc>> childOperators = parent
+          .getChildOperators();
+      int pos = childOperators.indexOf(op);
+      childOperators.remove(pos);
+      childOperators.add(pos, deterministicFilter);
+
+      int pPos = op.getParentOperators().indexOf(parent);
+      deterministicFilter.getParentOperators().add(pPos, parent);
+    }
+
+    op.getParentOperators().clear();
+    op.getParentOperators().add(deterministicFilter);
+    op.getConf().setPredicate(nondeterministicCondn);
+
+    if (HiveConf.getBoolVar(owi.getParseContext().getConf(),
+        HiveConf.ConfVars.HIVEPPDREMOVEDUPLICATEFILTERS)) {
+      // remove the candidate filter ops
+      for (FilterOperator fop : owi.getCandidateFilterOps()) {

Review comment:
       why do we remove the candidate filter operation here?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -782,6 +790,89 @@ protected ExprWalkerInfo mergeChildrenPred(Node nd, OpWalkerInfo owi,
     }
   }
 
+  protected static Object splitFilter(FilterOperator op,
+      ExprWalkerInfo ewi, OpWalkerInfo owi) throws SemanticException {
+
+    RowSchema inputRS = op.getSchema();
+
+    Map<String, List<ExprNodeDesc>> pushDownPreds = ewi.getFinalCandidates();
+    Map<String, List<ExprNodeDesc>> unPushDownPreds = ewi.getNonFinalCandidates();
+
+    // combine all deterministic predicates into a single expression
+    List<ExprNodeDesc> deterministicPreds = new ArrayList<ExprNodeDesc>();
+    Iterator<List<ExprNodeDesc>> iterator1 = pushDownPreds.values().iterator();
+    while (iterator1.hasNext()) {
+      for (ExprNodeDesc pred : iterator1.next()) {
+        deterministicPreds = ExprNodeDescUtils.split(pred, deterministicPreds);
+      }
+    }
+
+    if (deterministicPreds.isEmpty()) {
+      return null;
+    }
+
+    List<ExprNodeDesc> nondeterministicPreds = new ArrayList<ExprNodeDesc>();
+    Iterator<List<ExprNodeDesc>> iterator2 = unPushDownPreds.values().iterator();
+    while (iterator2.hasNext()) {
+      for (ExprNodeDesc pred : iterator2.next()) {
+        nondeterministicPreds = ExprNodeDescUtils.split(pred, nondeterministicPreds);
+      }
+    }
+
+    assert !nondeterministicPreds.isEmpty();
+
+    ExprNodeDesc deterministicCondn = ExprNodeDescUtils.mergePredicates(deterministicPreds);
+    ExprNodeDesc nondeterministicCondn = ExprNodeDescUtils.mergePredicates(nondeterministicPreds);
+
+    Operator<FilterDesc> deterministicFilter =
+        OperatorFactory.get(new FilterDesc(deterministicCondn, false), new RowSchema(inputRS.getSignature()));
+
+    deterministicFilter.setChildOperators(new ArrayList<Operator<? extends OperatorDesc>>());

Review comment:
       use OpProcFactory.createFilter may be better?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -782,6 +790,89 @@ protected ExprWalkerInfo mergeChildrenPred(Node nd, OpWalkerInfo owi,
     }
   }
 
+  protected static Object splitFilter(FilterOperator op,
+      ExprWalkerInfo ewi, OpWalkerInfo owi) throws SemanticException {
+
+    RowSchema inputRS = op.getSchema();
+
+    Map<String, List<ExprNodeDesc>> pushDownPreds = ewi.getFinalCandidates();
+    Map<String, List<ExprNodeDesc>> unPushDownPreds = ewi.getNonFinalCandidates();
+
+    // combine all deterministic predicates into a single expression
+    List<ExprNodeDesc> deterministicPreds = new ArrayList<ExprNodeDesc>();
+    Iterator<List<ExprNodeDesc>> iterator1 = pushDownPreds.values().iterator();
+    while (iterator1.hasNext()) {
+      for (ExprNodeDesc pred : iterator1.next()) {
+        deterministicPreds = ExprNodeDescUtils.split(pred, deterministicPreds);

Review comment:
       The ```ExprWalkerProcFactory.extractPushdownPreds(owi, op, predicate)```  has already splited the pred, maybe there is no need to split again

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -417,9 +417,17 @@ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
         if (!ewi.isDeterministic()) {
           /* predicate is not deterministic */
           if (op.getChildren() != null && op.getChildren().size() == 1) {
+            ExprWalkerInfo prunedPreds = owi.getPrunedPreds((Operator<? extends OperatorDesc>) (op
+                .getChildren().get(0)));
+            //resolve of HIVE-23893
+            if (!(prunedPreds != null && prunedPreds.hasAnyCandidates())

Review comment:
       why the pruned predicates of the child could not be pushed down? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 edited a comment on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
dengzhhu653 edited a comment on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-666864106


   ... you can give a view on "Understanding Hive Branches" on the above link, and add some tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] letsflyinthesky commented on a change in pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
letsflyinthesky commented on a change in pull request #1322:
URL: https://github.com/apache/hive/pull/1322#discussion_r460832421



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -782,6 +790,89 @@ protected ExprWalkerInfo mergeChildrenPred(Node nd, OpWalkerInfo owi,
     }
   }
 
+  protected static Object splitFilter(FilterOperator op,
+      ExprWalkerInfo ewi, OpWalkerInfo owi) throws SemanticException {
+
+    RowSchema inputRS = op.getSchema();
+
+    Map<String, List<ExprNodeDesc>> pushDownPreds = ewi.getFinalCandidates();
+    Map<String, List<ExprNodeDesc>> unPushDownPreds = ewi.getNonFinalCandidates();
+
+    // combine all deterministic predicates into a single expression
+    List<ExprNodeDesc> deterministicPreds = new ArrayList<ExprNodeDesc>();
+    Iterator<List<ExprNodeDesc>> iterator1 = pushDownPreds.values().iterator();
+    while (iterator1.hasNext()) {
+      for (ExprNodeDesc pred : iterator1.next()) {
+        deterministicPreds = ExprNodeDescUtils.split(pred, deterministicPreds);

Review comment:
       yes, may we could remove iterator here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-666340785


   Hey @letsflyinthesky, you can move your fixes upon the master branch. There are some detailed useful guidelines of how to contribute on https://cwiki.apache.org/confluence/display/Hive/HowToContribute.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] letsflyinthesky commented on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
letsflyinthesky commented on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-666349043


   May you give me some advice about why to remove. My commit could don't resolve the problem or the way I the commi dose not conform to regulation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #1322: HIVE-23893/HIVE-14661,split filter into deterministic filter which will be push down and nondeterministic filter which keeps current position

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #1322:
URL: https://github.com/apache/hive/pull/1322#issuecomment-701095278


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org