You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/09/22 12:37:37 UTC

[GitHub] [hive] kasakrisz opened a new pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

kasakrisz opened a new pull request #2663:
URL: https://github.com/apache/hive/pull/2663


   ### What changes were proposed in this pull request?
   1. Add Calcite rule to turn on populating writeId in insert only TableScans.
   2. Apply the rule in case of incremental materialized view rebuild
   * when there were no delete operations since the last MV rebuild
   * or only partition based incremental rebuild can be applied.
   
   ### Why are the changes needed?
   Materialized views with insert only source tables can not be rebuilt incrementally.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_create_rewrite_8.q,materialized_view_create_rewrite_9.q,materialized_view_partitioned_create_rewrite_agg.q,materialized_view_partitioned_create_rewrite_agg_2.q,materialized_view_partitioned_create_rewrite_agg_3.q -pl itests/qtest -Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719269143



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+    super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+    return !tableMD.isMaterializedView() && AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    RelNode newTableScan = call.builder()
+            .push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
       WriteId is populated into the `ROW_ID` struct which contains three fields: `writeId, bucketId, rowId`. In case of insert-only tables `rowId` is not available so `FetchInsertOnlyBucketIds` is an internal feature.
   The initial logical plan for MV rebuild is a full rebuild and it does not required writeIds.
   Later when all check passed which is need for incremental rebuild the plan is transformed to an incremental rebuild plan and that also required `writeId` filtering.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719314095



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+    super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+    return !tableMD.isMaterializedView() && AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    RelNode newTableScan = call.builder()
+            .push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
       I assume that at some point (later in the compilation phase),  you check the `HiveTableScan` operator to obtain/use the `FetchInsertOnlyBucketIds` trait. Unless I am missing something you have all the information already inside the `HiveTableScan` (`!tableMD.isMaterializedView() && AcidUtils.isInsertOnlyTable(tableMD)`) to derive it on the fly when needed. I don't understand very well why you need a rule to set it explicitly. 
   
   Apologies if my reasoning is off but this is my happy break from escalations :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719291208



##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;

Review comment:
       Yes. Originally there were no `EXPLAIN CBO` in other tests. Since I added it here the `EXPLAIN` is not necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719415132



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+    super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+    return !tableMD.isMaterializedView() && AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    RelNode newTableScan = call.builder()
+            .push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
       Sure. Added these details to the javadocs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719236721



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+    super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+    return !tableMD.isMaterializedView() && AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    RelNode newTableScan = call.builder()
+            .push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
       I suppose we have the information that a table is insert only before constructing the initial logical plan. Why do we need a rule to set this and we don't do it when constructing the initial `HiveTableScan`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719385811



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+    super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+    return !tableMD.isMaterializedView() && AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    RelNode newTableScan = call.builder()
+            .push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
       Please add a small comment in the Class javadoc cause I am sure my future self seeing this will not remember the discussion and may try to remove the rule or put it in another place :).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719283577



##########
File path: ql/src/test/queries/clientpositive/materialized_view_parquet.q
##########
@@ -189,8 +189,10 @@ alter materialized view mv1_parquet_n2 rebuild;
 alter materialized view mv1_parquet_n2 rebuild;
 
 explain cbo
-select name from emps_parquet_n3 group by name;
+select name, sum(empid) from emps_parquet_n3 group by name;

Review comment:
       The original test was actually hiding that projecting `ROW__ID.writeId` in case of insert-only tables always return `NULL`. However incremental MV rebuild was used and because of the `NULL` writeIds it did not inserted any records so the view data remained in an outdated state but the view was marked up to date by the system.
   By adding the aggregation `sum(empid)` and the extra query after dropping the view the difference in the results showed up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719290528



##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;

Review comment:
       Removed these.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] asolimando commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
asolimando commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719160162



##########
File path: ql/src/test/queries/clientpositive/materialized_view_partitioned_create_rewrite_agg_3.q
##########
@@ -0,0 +1,46 @@
+-- Test partition bases MV rebuild when source table is insert only

Review comment:
       Typo: partition bases -> partition based




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719275880



##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;

Review comment:
       Incremental MV rebuild plan is created in two teps:
   1. CBO rewrites the query part of the plan
   2. The generated AST from the CBO plan is transformed from a union based `insert-overwrite` to an `insert` plan




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719290288



##########
File path: ql/src/test/queries/clientpositive/materialized_view_partitioned_create_rewrite_agg_3.q
##########
@@ -0,0 +1,46 @@
+-- Test partition bases MV rebuild when source table is insert only

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719322625



##########
File path: ql/src/test/queries/clientpositive/materialized_view_parquet.q
##########
@@ -189,8 +189,10 @@ alter materialized view mv1_parquet_n2 rebuild;
 alter materialized view mv1_parquet_n2 rebuild;
 
 explain cbo
-select name from emps_parquet_n3 group by name;
+select name, sum(empid) from emps_parquet_n3 group by name;

Review comment:
       Thanks for clarifying




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719371779



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+    super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+    return !tableMD.isMaterializedView() && AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    RelNode newTableScan = call.builder()
+            .push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
       I don't want to enable fetching writeId automatically for any non MV insert only table.
   If someone execute the query like 
   ```
   SELECT t1.ROW__ID, t1.ROW__ID.writeId, a, b FROM t1;
   ```
   where `t1` is an insert only table 
   the result will contain `0` for ROW__ID.rowId for all records but will contain valid values for bucketId and writeId but only for not compacted records. This is confusing for users.
   
   But this feature can still be used in case of incremental MV rebuild since compaction excludes the use of incremental rebuild.
   
   So the goal was enable writeId fetch just when it is necessary. To control it in the AST level the `insertonly.fetch.bucketid` is used. It is a similar property like the `acid.fetch.deleted.rows`.
   See #2549
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719279447



##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+-- NOW IT CAN BE USED AGAIN
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- NOW AN UPDATE
+UPDATE cmv_basetable_n6 SET a=2 WHERE a=1;
+
+-- INCREMENTAL REBUILD CAN BE TRIGGERED
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+-- MV CAN BE USED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+drop materialized view cmv_mat_view_n6;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;

Review comment:
       The intention was to check if we got the same results in case the view is used and not used.
   Personally this is really useful for me. :slightly_smiling_face:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz merged pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz merged pull request #2663:
URL: https://github.com/apache/hive/pull/2663


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719291208



##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;

Review comment:
       Yes. Originally there were no `EXPLAIN CBO` in other tests. Since I added it the `EXPLAIN` is not necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719383292



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+    super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+    return !tableMD.isMaterializedView() && AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveTableScan tableScan = call.rel(0);
+    RelNode newTableScan = call.builder()
+            .push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
       OK, I finally got it :) Thanks for explaining.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2663: HIVE-25546: Enable incremental rebuild of Materialized views with insert only source tables

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719238961



##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;

Review comment:
       Why do the properties below matter for the incremental rebuild?
   ```
   hive.vectorized.execution.enabled=false
   hive.strict.checks.cartesian.product=false
   ```

##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;

Review comment:
       Is it necessary?

##########
File path: ql/src/test/queries/clientpositive/materialized_view_parquet.q
##########
@@ -189,8 +189,10 @@ alter materialized view mv1_parquet_n2 rebuild;
 alter materialized view mv1_parquet_n2 rebuild;
 
 explain cbo
-select name from emps_parquet_n3 group by name;
+select name, sum(empid) from emps_parquet_n3 group by name;

Review comment:
       Why are we changing existing queries? If it is useless then it's fine, if we want to test additional stuff then we should add new queries instead of modifying existing ones.

##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;

Review comment:
       I had the impression that view based rewritting takes place in CBO. Isn't `EXPLAIN CBO` sufficient?

##########
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##########
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+-- NOW IT CAN BE USED AGAIN
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- NOW AN UPDATE
+UPDATE cmv_basetable_n6 SET a=2 WHERE a=1;
+
+-- INCREMENTAL REBUILD CAN BE TRIGGERED
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+-- MV CAN BE USED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+drop materialized view cmv_mat_view_n6;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;

Review comment:
       If the intention is to test that the view is not used after dropping it then we should have `EXPLAIN CBO` here. Ensuring that results are correct when the views are not used is not really necessary in my opinion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org