You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/02/01 13:28:19 UTC

[GitHub] [hive] zabetak commented on a change in pull request #2966: HIVE-25758: OOM due to recursive application of CBO rules

zabetak commented on a change in pull request #2966:
URL: https://github.com/apache/hive/pull/2966#discussion_r796565785



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
##########
@@ -144,27 +146,70 @@ public void onMatch(RelOptRuleCall call) {
     }
 
     // We need to filter i) those that have been pushed already as stored in the join,
-    // and ii) those that were already in the subtree rooted at child
-    ImmutableList<RexNode> toPush = HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude,
-            child, valids);
-    return toPush;
+    // ii) those that were already in the subtree rooted at child,
+    // iii) predicates that are not safe for transitive inference.
+    //
+    // There is no formal definition of safety for predicate inference, only an empirical one.
+    // An unsafe predicate in this context is one that when pushed across join operands, can lead
+    // to redundant predicates that cannot be simplified (by means of predicates merging with other existing ones).
+    // This situation can lead to an OOM for cases where lack of simplification allows inferring new predicates
+    // (from LHS to RHS) recursively, predicates which are redundant, but that RexSimplify cannot handle.
+    // This notion can be relaxed as soon as RexSimplify gets more powerful, and it can handle such cases.
+    List<RexNode> toPush = HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude, child, valids).stream()
+        .filter(UNSAFE_OPERATORS_FINDER::isSafe)
+        .collect(Collectors.toList());
+
+    return ImmutableList.copyOf(toPush);
   }
 
-  private RexNode getTypeSafePred(RelOptCluster cluster, RexNode rex, RelDataType rType) {
-    RexNode typeSafeRex = rex;
-    if ((typeSafeRex instanceof RexCall) && HiveCalciteUtil.isComparisonOp((RexCall) typeSafeRex)) {
-      RexBuilder rb = cluster.getRexBuilder();
-      List<RexNode> fixedPredElems = new ArrayList<RexNode>();
-      RelDataType commonType = cluster.getTypeFactory().leastRestrictive(
-          RexUtil.types(((RexCall) rex).getOperands()));
-      for (RexNode rn : ((RexCall) rex).getOperands()) {
-        fixedPredElems.add(rb.ensureType(commonType, rn, true));
-      }
+  //~ Inner Classes ----------------------------------------------------------
+
+  /**
+   * Finds unsafe operators in an expression (at any level of nesting).

Review comment:
       Add few examples with "unsafe" expressions.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
##########
@@ -144,27 +146,70 @@ public void onMatch(RelOptRuleCall call) {
     }
 
     // We need to filter i) those that have been pushed already as stored in the join,
-    // and ii) those that were already in the subtree rooted at child
-    ImmutableList<RexNode> toPush = HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude,
-            child, valids);
-    return toPush;
+    // ii) those that were already in the subtree rooted at child,
+    // iii) predicates that are not safe for transitive inference.
+    //
+    // There is no formal definition of safety for predicate inference, only an empirical one.

Review comment:
       The existence of the registry is an element that makes the behavior of this rule strongly dependent on the behavior of other rules. This change adds an additional dependency on the behavior of other rules & components which is not great. We can leave with it for some time but not sure if we can say it is a long term solution.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
##########
@@ -74,6 +74,8 @@
   public static final HiveJoinPushTransitivePredicatesRule INSTANCE_ANTIJOIN =
           new HiveJoinPushTransitivePredicatesRule(HiveAntiJoin.class, HiveRelFactories.HIVE_FILTER_FACTORY);
 
+  private static final UnsafeOperatorsFinder UNSAFE_OPERATORS_FINDER = new UnsafeOperatorsFinder(true);

Review comment:
       I think we may end-up calling this visitor from multiple threads so I would either remove static or make the class immutable.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
##########
@@ -144,27 +146,70 @@ public void onMatch(RelOptRuleCall call) {
     }
 
     // We need to filter i) those that have been pushed already as stored in the join,
-    // and ii) those that were already in the subtree rooted at child
-    ImmutableList<RexNode> toPush = HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude,
-            child, valids);
-    return toPush;
+    // ii) those that were already in the subtree rooted at child,
+    // iii) predicates that are not safe for transitive inference.
+    //
+    // There is no formal definition of safety for predicate inference, only an empirical one.
+    // An unsafe predicate in this context is one that when pushed across join operands, can lead
+    // to redundant predicates that cannot be simplified (by means of predicates merging with other existing ones).
+    // This situation can lead to an OOM for cases where lack of simplification allows inferring new predicates
+    // (from LHS to RHS) recursively, predicates which are redundant, but that RexSimplify cannot handle.
+    // This notion can be relaxed as soon as RexSimplify gets more powerful, and it can handle such cases.
+    List<RexNode> toPush = HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude, child, valids).stream()
+        .filter(UNSAFE_OPERATORS_FINDER::isSafe)
+        .collect(Collectors.toList());
+
+    return ImmutableList.copyOf(toPush);
   }
 
-  private RexNode getTypeSafePred(RelOptCluster cluster, RexNode rex, RelDataType rType) {
-    RexNode typeSafeRex = rex;
-    if ((typeSafeRex instanceof RexCall) && HiveCalciteUtil.isComparisonOp((RexCall) typeSafeRex)) {
-      RexBuilder rb = cluster.getRexBuilder();
-      List<RexNode> fixedPredElems = new ArrayList<RexNode>();
-      RelDataType commonType = cluster.getTypeFactory().leastRestrictive(
-          RexUtil.types(((RexCall) rex).getOperands()));
-      for (RexNode rn : ((RexCall) rex).getOperands()) {
-        fixedPredElems.add(rb.ensureType(commonType, rn, true));
-      }
+  //~ Inner Classes ----------------------------------------------------------
+
+  /**
+   * Finds unsafe operators in an expression (at any level of nesting).
+   */
+  private static class UnsafeOperatorsFinder extends RexVisitorImpl<Void> {
+    // accounting for DeMorgan's law
+    boolean inNegation = false;
 
-      typeSafeRex = rb.makeCall(((RexCall) typeSafeRex).getOperator(), fixedPredElems);
+    protected UnsafeOperatorsFinder(boolean deep) {
+      super(deep);

Review comment:
       ```suggestion
       protected UnsafeOperatorsFinder() {
         super(true);
   ```

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
##########
@@ -144,27 +146,70 @@ public void onMatch(RelOptRuleCall call) {
     }
 
     // We need to filter i) those that have been pushed already as stored in the join,
-    // and ii) those that were already in the subtree rooted at child
-    ImmutableList<RexNode> toPush = HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude,
-            child, valids);
-    return toPush;
+    // ii) those that were already in the subtree rooted at child,
+    // iii) predicates that are not safe for transitive inference.
+    //
+    // There is no formal definition of safety for predicate inference, only an empirical one.
+    // An unsafe predicate in this context is one that when pushed across join operands, can lead
+    // to redundant predicates that cannot be simplified (by means of predicates merging with other existing ones).
+    // This situation can lead to an OOM for cases where lack of simplification allows inferring new predicates
+    // (from LHS to RHS) recursively, predicates which are redundant, but that RexSimplify cannot handle.
+    // This notion can be relaxed as soon as RexSimplify gets more powerful, and it can handle such cases.
+    List<RexNode> toPush = HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude, child, valids).stream()
+        .filter(UNSAFE_OPERATORS_FINDER::isSafe)
+        .collect(Collectors.toList());
+
+    return ImmutableList.copyOf(toPush);
   }
 
-  private RexNode getTypeSafePred(RelOptCluster cluster, RexNode rex, RelDataType rType) {
-    RexNode typeSafeRex = rex;
-    if ((typeSafeRex instanceof RexCall) && HiveCalciteUtil.isComparisonOp((RexCall) typeSafeRex)) {
-      RexBuilder rb = cluster.getRexBuilder();
-      List<RexNode> fixedPredElems = new ArrayList<RexNode>();
-      RelDataType commonType = cluster.getTypeFactory().leastRestrictive(
-          RexUtil.types(((RexCall) rex).getOperands()));
-      for (RexNode rn : ((RexCall) rex).getOperands()) {
-        fixedPredElems.add(rb.ensureType(commonType, rn, true));
-      }
+  //~ Inner Classes ----------------------------------------------------------
+
+  /**
+   * Finds unsafe operators in an expression (at any level of nesting).
+   */
+  private static class UnsafeOperatorsFinder extends RexVisitorImpl<Void> {
+    // accounting for DeMorgan's law
+    boolean inNegation = false;
 
-      typeSafeRex = rb.makeCall(((RexCall) typeSafeRex).getOperator(), fixedPredElems);
+    protected UnsafeOperatorsFinder(boolean deep) {
+      super(deep);
     }
 
-    return typeSafeRex;
+    @Override
+    public Void visitCall(RexCall call) {
+      switch (call.getKind()) {
+      case OR:
+        if (inNegation) {
+          return super.visitCall(call);
+        } else {
+          throw Util.FoundOne.NULL;

Review comment:
       You could possibly avoid using the exception for control flow by simply setting a variable here:
   ```suggestion
             this.isSafe = false;
             return null;
   ```

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveInBetweenExpandRule.java
##########
@@ -138,7 +138,7 @@ public void onMatch(RelOptRuleCall call) {
     private final RexBuilder rexBuilder;
     private boolean modified;
 
-    private RexInBetweenExpander(RexBuilder rexBuilder) {
+    RexInBetweenExpander(RexBuilder rexBuilder) {

Review comment:
       I don't see why this change is necessary. Can you explain?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org