You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/02/19 13:57:07 UTC

[GitHub] [hive] kasakrisz opened a new pull request #1995: HIVE-24775: Incorrect null handling when rebuilding Materialized view incrementally

kasakrisz opened a new pull request #1995:
URL: https://github.com/apache/hive/pull/1995


   ### What changes were proposed in this pull request?
   When transforming plan of materialized view rebuild to incremental rebuild in case of the view definition has aggregate:
   * Use null safe equality operators in join condition instead of equality operators
   * Introduce a Project on top of MV scan having all columns from the view plus a boolean literal which indicates whether the row with the key values coming from the joinRightInput exists in the view. Use this flag in `CalcitePlanner.fixUpASTAggregateIncrementalRebuild` to decide which branch a row should go: update or insert.
   * Add null checks of aggregated values when calculated new aggregated values.
   
   ### Why are the changes needed?
   Rows with null aggregate keys and aggregated values was not handled by incremental MV rebuild and it could lead to data corruption.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Query result changes. See jira for example.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_create_rewrite_nulls.q,materialized_view_create_rewrite_4.q,materialized_view_create_rewrite_one_key_gby.q -pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1995: HIVE-24775: Incorrect null handling when rebuilding Materialized view incrementally

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1995:
URL: https://github.com/apache/hive/pull/1995#discussion_r579468586



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##########
@@ -99,31 +100,61 @@ public void onMatch(RelOptRuleCall call) {
     final RexBuilder rexBuilder =
         agg.getCluster().getRexBuilder();
     // 1) First branch is query, second branch is MV
-    final RelNode joinLeftInput = union.getInput(1);
+    RelNode joinLeftInput = union.getInput(1);
     final RelNode joinRightInput = union.getInput(0);
-    // 2) Build conditions for join and filter and start adding
+
+    // 2) Introduce a Project on top of MV scan having all columns from the view plus a boolean literal which indicates
+    // whether the row with the key values coming from the joinRightInput exists in the view:
+    // - true means exist
+    // - null means not exists
+    // Project also needed to encapsulate the view scan by a subquery -> this is required by
+    // CalcitePlanner.fixUpASTAggregateIncrementalRebuild
+    List<RexNode> mvCols = new ArrayList<>(joinLeftInput.getRowType().getFieldCount());
+    for (int i = 0; i < joinLeftInput.getRowType().getFieldCount(); ++i) {
+      mvCols.add(rexBuilder.makeInputRef(
+              joinLeftInput.getRowType().getFieldList().get(i).getType(), i));
+    }
+    mvCols.add(rexBuilder.makeLiteral(true));
+
+    joinLeftInput = call.builder()
+            .push(joinLeftInput)
+            .project(mvCols)
+            .build();
+
+    // 3) Build conditions for join and start adding
     // expressions for project operator
     List<RexNode> projExprs = new ArrayList<>();
     List<RexNode> joinConjs = new ArrayList<>();
-    List<RexNode> filterConjs = new ArrayList<>();
     int groupCount = agg.getGroupCount();
     int totalCount = agg.getGroupCount() + agg.getAggCallList().size();
-    for (int leftPos = 0, rightPos = totalCount;
+    for (int leftPos = 0, rightPos = totalCount + 1;
          leftPos < groupCount; leftPos++, rightPos++) {
       RexNode leftRef = rexBuilder.makeInputRef(
           joinLeftInput.getRowType().getFieldList().get(leftPos).getType(), leftPos);
       RexNode rightRef = rexBuilder.makeInputRef(
           joinRightInput.getRowType().getFieldList().get(leftPos).getType(), rightPos);
       projExprs.add(rightRef);
-      joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,
-          ImmutableList.of(leftRef, rightRef)));
-      filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,
-          ImmutableList.of(leftRef)));
+
+      RexNode nsEqExpr = rexBuilder.makeCall(SqlStdOperatorTable.IS_NOT_DISTINCT_FROM,
+              ImmutableList.of(leftRef, rightRef));
+      joinConjs.add(nsEqExpr);
     }
-    // 3) Add the expressions that correspond to the aggregation
+
+    // 4) Create join node
+    RexNode joinCond = RexUtil.composeConjunction(rexBuilder, joinConjs);
+    RelNode join = call.builder()
+            .push(joinLeftInput)
+            .push(joinRightInput)
+            .join(JoinRelType.RIGHT, joinCond)
+            .build();
+
+    int flagIndex = joinLeftInput.getRowType().getFieldCount() - 1;
+    RexNode flagNode = rexBuilder.makeInputRef(
+            join.getRowType().getFieldList().get(flagIndex).getType(), flagIndex);
+
+    // 5) Add the expressions that correspond to the aggregation
     // functions
-    RexNode caseFilterCond = RexUtil.composeConjunction(rexBuilder, filterConjs);
-    for (int i = 0, leftPos = groupCount, rightPos = totalCount + groupCount;
+    for (int i = 0, leftPos = groupCount, rightPos = totalCount + 1 + groupCount;
          leftPos < totalCount; i++, leftPos++, rightPos++) {
       // case when mv2.deptno IS NULL AND mv2.deptname IS NULL then s else source.s + mv2.s end

Review comment:
       This case statement should rely on the flag too?
   ```
   // case mv2.flag IS NULL then s else source.s + mv2.s end
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1995: HIVE-24775: Incorrect null handling when rebuilding Materialized view incrementally

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1995:
URL: https://github.com/apache/hive/pull/1995#discussion_r579690303



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##########
@@ -157,17 +188,27 @@ public void onMatch(RelOptRuleCall call) {
         throw new AssertionError("Found an aggregation that could not be"
             + " recognized: " + aggCall);
       }
+      // According to SQL standard (and Hive) Aggregate functions eliminates null values however operators used in
+      // elseReturn expressions returns null if one of their operands is null
+      // hence we need a null check of both operands
+      RexNode leftNull = rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, leftRef);
+      RexNode rightNull = rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, rightRef);
       projExprs.add(rexBuilder.makeCall(SqlStdOperatorTable.CASE,
-          ImmutableList.of(caseFilterCond, rightRef, elseReturn)));
+              rexBuilder.makeCall(SqlStdOperatorTable.AND, leftNull, rightNull), rexBuilder.makeNullLiteral(leftRef.getType()),

Review comment:
       I think this branch can be removed? If both are null, you will fall into next branch, which will return rightRef (null).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz merged pull request #1995: HIVE-24775: Incorrect null handling when rebuilding Materialized view incrementally

Posted by GitBox <gi...@apache.org>.
kasakrisz merged pull request #1995:
URL: https://github.com/apache/hive/pull/1995


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1995: HIVE-24775: Incorrect null handling when rebuilding Materialized view incrementally

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1995:
URL: https://github.com/apache/hive/pull/1995#discussion_r579678432



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##########
@@ -99,31 +100,61 @@ public void onMatch(RelOptRuleCall call) {
     final RexBuilder rexBuilder =
         agg.getCluster().getRexBuilder();
     // 1) First branch is query, second branch is MV
-    final RelNode joinLeftInput = union.getInput(1);
+    RelNode joinLeftInput = union.getInput(1);
     final RelNode joinRightInput = union.getInput(0);
-    // 2) Build conditions for join and filter and start adding
+
+    // 2) Introduce a Project on top of MV scan having all columns from the view plus a boolean literal which indicates
+    // whether the row with the key values coming from the joinRightInput exists in the view:
+    // - true means exist
+    // - null means not exists
+    // Project also needed to encapsulate the view scan by a subquery -> this is required by
+    // CalcitePlanner.fixUpASTAggregateIncrementalRebuild
+    List<RexNode> mvCols = new ArrayList<>(joinLeftInput.getRowType().getFieldCount());
+    for (int i = 0; i < joinLeftInput.getRowType().getFieldCount(); ++i) {
+      mvCols.add(rexBuilder.makeInputRef(
+              joinLeftInput.getRowType().getFieldList().get(i).getType(), i));
+    }
+    mvCols.add(rexBuilder.makeLiteral(true));
+
+    joinLeftInput = call.builder()
+            .push(joinLeftInput)
+            .project(mvCols)
+            .build();
+
+    // 3) Build conditions for join and start adding
     // expressions for project operator
     List<RexNode> projExprs = new ArrayList<>();
     List<RexNode> joinConjs = new ArrayList<>();
-    List<RexNode> filterConjs = new ArrayList<>();
     int groupCount = agg.getGroupCount();
     int totalCount = agg.getGroupCount() + agg.getAggCallList().size();
-    for (int leftPos = 0, rightPos = totalCount;
+    for (int leftPos = 0, rightPos = totalCount + 1;
          leftPos < groupCount; leftPos++, rightPos++) {
       RexNode leftRef = rexBuilder.makeInputRef(
           joinLeftInput.getRowType().getFieldList().get(leftPos).getType(), leftPos);
       RexNode rightRef = rexBuilder.makeInputRef(
           joinRightInput.getRowType().getFieldList().get(leftPos).getType(), rightPos);
       projExprs.add(rightRef);
-      joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,
-          ImmutableList.of(leftRef, rightRef)));
-      filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,
-          ImmutableList.of(leftRef)));
+
+      RexNode nsEqExpr = rexBuilder.makeCall(SqlStdOperatorTable.IS_NOT_DISTINCT_FROM,
+              ImmutableList.of(leftRef, rightRef));
+      joinConjs.add(nsEqExpr);
     }
-    // 3) Add the expressions that correspond to the aggregation
+
+    // 4) Create join node
+    RexNode joinCond = RexUtil.composeConjunction(rexBuilder, joinConjs);
+    RelNode join = call.builder()
+            .push(joinLeftInput)
+            .push(joinRightInput)
+            .join(JoinRelType.RIGHT, joinCond)
+            .build();
+
+    int flagIndex = joinLeftInput.getRowType().getFieldCount() - 1;
+    RexNode flagNode = rexBuilder.makeInputRef(
+            join.getRowType().getFieldList().get(flagIndex).getType(), flagIndex);
+
+    // 5) Add the expressions that correspond to the aggregation
     // functions
-    RexNode caseFilterCond = RexUtil.composeConjunction(rexBuilder, filterConjs);
-    for (int i = 0, leftPos = groupCount, rightPos = totalCount + groupCount;
+    for (int i = 0, leftPos = groupCount, rightPos = totalCount + 1 + groupCount;
          leftPos < totalCount; i++, leftPos++, rightPos++) {
       // case when mv2.deptno IS NULL AND mv2.deptname IS NULL then s else source.s + mv2.s end

Review comment:
       Unfortunately relying on the flag only is not enough.
   Aggregate functions eliminates null values:
   ```
   insert into t1(a,b) values 
   (1, NULL),
   (1, 10);
   select a, sum(b) from t1 group by a
   ```
   results
   ```
   1, 10
   ```
   However
   ```
   select 10 + NULL
   ```
   is `NULL`.
   
   The comment is not valid any more. I will update it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1995: HIVE-24775: Incorrect null handling when rebuilding Materialized view incrementally

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1995:
URL: https://github.com/apache/hive/pull/1995#discussion_r579690173



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##########
@@ -99,31 +100,61 @@ public void onMatch(RelOptRuleCall call) {
     final RexBuilder rexBuilder =
         agg.getCluster().getRexBuilder();
     // 1) First branch is query, second branch is MV
-    final RelNode joinLeftInput = union.getInput(1);
+    RelNode joinLeftInput = union.getInput(1);
     final RelNode joinRightInput = union.getInput(0);
-    // 2) Build conditions for join and filter and start adding
+
+    // 2) Introduce a Project on top of MV scan having all columns from the view plus a boolean literal which indicates
+    // whether the row with the key values coming from the joinRightInput exists in the view:
+    // - true means exist
+    // - null means not exists
+    // Project also needed to encapsulate the view scan by a subquery -> this is required by
+    // CalcitePlanner.fixUpASTAggregateIncrementalRebuild
+    List<RexNode> mvCols = new ArrayList<>(joinLeftInput.getRowType().getFieldCount());
+    for (int i = 0; i < joinLeftInput.getRowType().getFieldCount(); ++i) {
+      mvCols.add(rexBuilder.makeInputRef(
+              joinLeftInput.getRowType().getFieldList().get(i).getType(), i));
+    }
+    mvCols.add(rexBuilder.makeLiteral(true));
+
+    joinLeftInput = call.builder()
+            .push(joinLeftInput)
+            .project(mvCols)
+            .build();
+
+    // 3) Build conditions for join and start adding
     // expressions for project operator
     List<RexNode> projExprs = new ArrayList<>();
     List<RexNode> joinConjs = new ArrayList<>();
-    List<RexNode> filterConjs = new ArrayList<>();
     int groupCount = agg.getGroupCount();
     int totalCount = agg.getGroupCount() + agg.getAggCallList().size();
-    for (int leftPos = 0, rightPos = totalCount;
+    for (int leftPos = 0, rightPos = totalCount + 1;
          leftPos < groupCount; leftPos++, rightPos++) {
       RexNode leftRef = rexBuilder.makeInputRef(
           joinLeftInput.getRowType().getFieldList().get(leftPos).getType(), leftPos);
       RexNode rightRef = rexBuilder.makeInputRef(
           joinRightInput.getRowType().getFieldList().get(leftPos).getType(), rightPos);
       projExprs.add(rightRef);
-      joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,
-          ImmutableList.of(leftRef, rightRef)));
-      filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL,
-          ImmutableList.of(leftRef)));
+
+      RexNode nsEqExpr = rexBuilder.makeCall(SqlStdOperatorTable.IS_NOT_DISTINCT_FROM,
+              ImmutableList.of(leftRef, rightRef));
+      joinConjs.add(nsEqExpr);
     }
-    // 3) Add the expressions that correspond to the aggregation
+
+    // 4) Create join node
+    RexNode joinCond = RexUtil.composeConjunction(rexBuilder, joinConjs);
+    RelNode join = call.builder()
+            .push(joinLeftInput)
+            .push(joinRightInput)
+            .join(JoinRelType.RIGHT, joinCond)
+            .build();
+
+    int flagIndex = joinLeftInput.getRowType().getFieldCount() - 1;
+    RexNode flagNode = rexBuilder.makeInputRef(
+            join.getRowType().getFieldList().get(flagIndex).getType(), flagIndex);
+
+    // 5) Add the expressions that correspond to the aggregation
     // functions
-    RexNode caseFilterCond = RexUtil.composeConjunction(rexBuilder, filterConjs);
-    for (int i = 0, leftPos = groupCount, rightPos = totalCount + groupCount;
+    for (int i = 0, leftPos = groupCount, rightPos = totalCount + 1 + groupCount;
          leftPos < totalCount; i++, leftPos++, rightPos++) {
       // case when mv2.deptno IS NULL AND mv2.deptname IS NULL then s else source.s + mv2.s end

Review comment:
       Got it! 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1995: HIVE-24775: Incorrect null handling when rebuilding Materialized view incrementally

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1995:
URL: https://github.com/apache/hive/pull/1995#discussion_r580218284



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java
##########
@@ -157,17 +188,27 @@ public void onMatch(RelOptRuleCall call) {
         throw new AssertionError("Found an aggregation that could not be"
             + " recognized: " + aggCall);
       }
+      // According to SQL standard (and Hive) Aggregate functions eliminates null values however operators used in
+      // elseReturn expressions returns null if one of their operands is null
+      // hence we need a null check of both operands
+      RexNode leftNull = rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, leftRef);
+      RexNode rightNull = rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, rightRef);
       projExprs.add(rexBuilder.makeCall(SqlStdOperatorTable.CASE,
-          ImmutableList.of(caseFilterCond, rightRef, elseReturn)));
+              rexBuilder.makeCall(SqlStdOperatorTable.AND, leftNull, rightNull), rexBuilder.makeNullLiteral(leftRef.getType()),

Review comment:
       Removed branch




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org