You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "amansinha100 (via GitHub)" <gi...@apache.org> on 2023/03/09 17:20:07 UTC

[GitHub] [hive] amansinha100 commented on a diff in pull request #4079: HIVE-27101: Support incremental materialized view rebuild when Iceberg source tables have insert operation only.

amansinha100 commented on code in PR #4079:
URL: https://github.com/apache/hive/pull/4079#discussion_r1131321197


##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -2101,6 +2102,55 @@ public boolean validateMaterializedViewsFromRegistry(List<Table> cachedMateriali
     }
   }
 
+  private Materialization getMaterializationInvalidationInfo(MaterializedViewMetadata metadata)
+      throws TException, HiveException {
+    Optional<SourceTable> first = metadata.getSourceTables().stream().findFirst();
+    if (!first.isPresent()) {
+      // This is unexpected: all MV must have at least one source
+      Materialization materialization = new Materialization();
+      materialization.setSourceTablesCompacted(true);
+      materialization.setSourceTablesUpdateDeleteModified(true);
+      return new Materialization();
+    } else {
+      Table table = getTable(first.get().getTable().getDbName(), first.get().getTable().getTableName());
+      if (!(table.isNonNative() && table.getStorageHandler().areSnapshotsSupported())) {
+        // Mixing native and non-native acid source tables are not supported. If the first source is native acid
+        // the rest is expected to be native acid
+        return getMSC().getMaterializationInvalidationInfo(
+                metadata.creationMetadata, conf.get(ValidTxnList.VALID_TXNS_KEY));
+      }
+    }
+
+    MaterializationSnapshot mvSnapshot = MaterializationSnapshot.fromJson(metadata.creationMetadata.getValidTxnList());
+
+    boolean hasDelete = false;
+    for (SourceTable sourceTable : metadata.getSourceTables()) {
+      Table table = getTable(sourceTable.getTable().getDbName(), sourceTable.getTable().getTableName());
+      HiveStorageHandler storageHandler = table.getStorageHandler();
+      if (storageHandler == null) {
+        Materialization materialization = new Materialization();
+        materialization.setSourceTablesCompacted(true);
+        return materialization;
+      }
+      Boolean b = storageHandler.hasDeletes(

Review Comment:
   Does hasDeletes() api cover all types of CRUD operations that are not insert operations ? e.g updates,  truncate table, drop partition etc. 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HivePushdownSnapshotFilterRule.java:
##########
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexTableInputRef;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.hadoop.hive.ql.metadata.VirtualColumn;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+
+import java.util.Set;
+
+/**
+ * Calcite rule to push down predicates contains {@link VirtualColumn#SNAPSHOT_ID} reference to TableScan.
+ * <p>
+ * This rule traverse the logical expression in {@link HiveFilter} operators and search for
+ * predicates like
+ * <p>
+ * <code>
+ *   snapshotId &lt;= 12345677899
+ * </code>
+ * <p>
+ * The literal is set in the {@link RelOptHiveTable#getHiveTableMD()} object wrapped by
+ * {@link org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan}
+ * and the original predicate in the {@link HiveFilter} is replaced with literal true.

Review Comment:
   nit: At another place I saw a mention that the HiveFilter would be dropped.  If that is done by a different rule, you may want to mention that here for clarification. 



##########
iceberg/iceberg-handler/src/test/queries/positive/mv_iceberg_orc4.q:
##########
@@ -0,0 +1,42 @@
+-- MV source tables are iceberg and MV has aggregate
+-- SORT_QUERY_RESULTS
+--! qt:replace:/(.*fromVersion=\[)\S+(\].*)/$1#Masked#$2/
+
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+
+drop table if exists tbl_ice;
+
+create external table tbl_ice(a int, b string, c int) stored by iceberg stored as orc tblproperties ('format-version'='1');
+create external table tbl_ice_v2(d int, e string, f int) stored by iceberg stored as orc tblproperties ('format-version'='2');
+
+insert into tbl_ice values (1, 'one', 50), (4, 'four', 53), (5, 'five', 54);
+insert into tbl_ice_v2 values (1, 'one v2', 50), (4, 'four v2', 53), (5, 'five v2', 54);
+
+create materialized view mat1 as
+select tbl_ice.b, tbl_ice.c, sum(tbl_ice_v2.f)
+from tbl_ice
+join tbl_ice_v2 on tbl_ice.a=tbl_ice_v2.d where tbl_ice.c > 52
+group by tbl_ice.b, tbl_ice.c;
+
+create materialized view mat2  as
+select tbl_ice.b, tbl_ice.c, sum(tbl_ice_v2.f), count(tbl_ice_v2.f), avg(tbl_ice_v2.f)
+from tbl_ice
+join tbl_ice_v2 on tbl_ice.a=tbl_ice_v2.d where tbl_ice.c > 52
+group by tbl_ice.b, tbl_ice.c;
+
+-- insert some new values to one of the source tables
+insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54);
+insert into tbl_ice_v2 values (1, 'one v2', 50), (4, 'four v2', 53), (5, 'five v2', 54);
+

Review Comment:
   Suggest adding a sanity test for DELETE .. in that case the new logic should skip MV incremental maintenance. 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HivePushdownSnapshotFilterRule.java:
##########
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexTableInputRef;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.hadoop.hive.ql.metadata.VirtualColumn;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+
+import java.util.Set;
+
+/**
+ * Calcite rule to push down predicates contains {@link VirtualColumn#SNAPSHOT_ID} reference to TableScan.
+ * <p>
+ * This rule traverse the logical expression in {@link HiveFilter} operators and search for
+ * predicates like
+ * <p>
+ * <code>
+ *   snapshotId &lt;= 12345677899
+ * </code>
+ * <p>
+ * The literal is set in the {@link RelOptHiveTable#getHiveTableMD()} object wrapped by
+ * {@link org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan}
+ * and the original predicate in the {@link HiveFilter} is replaced with literal true.
+ */
+public class HivePushdownSnapshotFilterRule extends RelRule<HivePushdownSnapshotFilterRule.Config> {
+
+  public static final RelOptRule INSTANCE =
+          RelRule.Config.EMPTY.as(HivePushdownSnapshotFilterRule.Config.class)
+            .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+            .withOperandSupplier(operandBuilder -> operandBuilder.operand(HiveFilter.class).anyInputs())
+            .withDescription("HivePushdownSnapshotFilterRule")
+            .toRule();
+
+  public interface Config extends RelRule.Config {
+    @Override
+    default HivePushdownSnapshotFilterRule toRule() {
+      return new HivePushdownSnapshotFilterRule(this);
+    }
+  }
+
+  private HivePushdownSnapshotFilterRule(Config config) {
+    super(config);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveFilter filter = call.rel(0);

Review Comment:
   Should this rule do an early exit if the filter does not contain a predicate on snapshotId ? 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAugmentSnapshotMaterializationRule.java:
##########
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.util.ImmutableBeans;
+import org.apache.hadoop.hive.common.type.SnapshotContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.metadata.VirtualColumn;
+import org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * This rule will rewrite the materialized view with information about
+ * its invalidation data. In particular, if any of the tables used by the
+ * materialization has been updated since the materialization was created,
+ * it will introduce a filter operator on top of that table in the materialization
+ * definition, making explicit the data contained in it so the rewriting
+ * algorithm can use this information to rewrite the query as a combination of the
+ * outdated materialization data and the new original data in the source tables.
+ * If the data in the source table matches the current data in the snapshot,
+ * no filter is created.
+ * In case of tables supports snapshots the filtering should be performed in the
+ * TableScan operator to read records only from the relevant snapshots.
+ * However, the union rewrite algorithm needs a so-called compensation predicate in
+ * a Filter operator to build the union branch produces the delta records.
+ * After union rewrite algorithm is executed the predicates on SnapshotIds
+ * are pushed down to the corresponding TableScan operator and removed from the Filter
+ * operator. So the reference to the {@link VirtualColumn#SNAPSHOT_ID} is temporal in the

Review Comment:
   nit: temporary instead of temporal



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveMaterializedViewUtils.java:
##########
@@ -277,6 +276,20 @@ public void visit(RelNode node, int ordinal, RelNode parent) {
             materialization.getAst());
   }
 
+  private static HiveRelOptMaterialization augmentMaterializationWithTimeInformation(

Review Comment:
   nit: pls add a brief comment for this important method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org