You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/10/28 08:38:30 UTC

[GitHub] [hive] kasakrisz opened a new pull request #2756: [draft] HIVE-25656: Get materialized view state based on number of affected rows by transactions

kasakrisz opened a new pull request #2756:
URL: https://github.com/apache/hive/pull/2756


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r748218177



##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1101,6 +1118,7 @@ struct CommitTxnRequest {
     5: optional CommitTxnKeyValue keyValue,
     6: optional bool exclWriteEnabled = true,
     7: optional TxnType txn_type,
+    8: optional set<AffectedRowCount> rowsAffected,

Review comment:
       if an older client connect - may it really leave this info out; and everything will work correctly? 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -235,6 +223,46 @@ private void updateStats(StatsAggregator statsAggregator, Map<String, String> pa
 
   }
 
+  private static class TransactionalStatsProcessor {
+    private final HiveTxnManager txnManager;
+    private final Partish partish;
+
+    private TransactionalStatsProcessor(HiveTxnManager txnManager, Partish partish) {
+      this.txnManager = txnManager;
+      this.partish = partish;
+    }
+
+    private long toLong(String value) {
+      if (value == null || value.isEmpty()) {
+        return 0;
+      }
+
+      return Long.parseLong(value);
+    }
+
+    public void process(StatsAggregator statsAggregator) throws HiveException, MetaException {
+      if (statsAggregator == null) {
+        return;
+      }
+
+      if (partish.isTransactionalTable()) {
+        String prefix = getAggregationPrefix(partish.getTable(), partish.getPartition());
+        long insertCount = toLong(statsAggregator.aggregateStats(prefix, INSERT_COUNT));
+        long updateCount = toLong(statsAggregator.aggregateStats(prefix, UPDATE_COUNT));
+        long deleteCount = toLong(statsAggregator.aggregateStats(prefix, DELETE_COUNT));
+
+        if (insertCount > 0 || updateCount > 0 || deleteCount > 0) {
+          AffectedRowCount affectedRowCount = new AffectedRowCount();
+          affectedRowCount.setTableId(partish.getTable().getTTable().getId());
+          affectedRowCount.setInsertCount(insertCount);
+          affectedRowCount.setUpdatedCount(updateCount);
+          affectedRowCount.setDeletedCount(deleteCount);
+
+          txnManager.addAffectedRowCount(affectedRowCount);

Review comment:
       I don't really understand why are we dragging this thru the the txnManager - right here in this class we know that for table `X` we made N,M,K inserts/updates/deletes - didn't we lose the context ; will it be harder to track it down?
   
   we already making some metastore calls from this class - wouldn't using those would be simpler? (I'm still looking around in this patch so I might be wrong)

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12193,13 +12194,14 @@ private void walkASTMarkTABREF(TableMask tableMask, ASTNode ast, Set<String> cte
         if (table.isMaterializedView()) {
           // When we are querying a materialized view directly, we check whether the source tables
           // do not apply any policies.
-          for (String qName : table.getCreationMetadata().getTablesUsed()) {
+          for (SourceTable sourceTable : table.getCreationMetadata().getTablesUsed()) {
+            String qualifiedTableName = TableName.getDbTable(sourceTable.getDbName(), sourceTable.getTableName());

Review comment:
       can't we use TableName objects instead of going back to strings?

##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -457,11 +457,21 @@ struct StorageDescriptor {
   12: optional bool   storedAsSubDirectories       // stored as subdirectories or not
 }
 
+struct SourceTable {
+    1: required string dbName,
+    2: required string tableName,
+    3: required i64 tableId,
+    4: required bool insertOnly,
+    5: required i64 insertedCount,
+    6: required i64 updatedCount,
+    7: required i64 deletedCount
+}
+
 struct CreationMetadata {
     1: required string catName
     2: required string dbName,
     3: required string tblName,
-    4: required set<string> tablesUsed,
+    4: required set<SourceTable> tablesUsed,

Review comment:
       this is an incompatible change of the thrift api - if an older client will ask for an rpc call with this data it will surely get into trouble....

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##########
@@ -2537,6 +2544,20 @@ private CreationMetadata convertToCreationMetadata(MCreationMetadata s) {
     return r;
   }
 
+  private SourceTable convertToSourceTable(MMVSource mmvSource) {
+    SourceTable sourceTable = new SourceTable();
+    MTable mTable = mmvSource.getTable();
+    sourceTable.setTableId(mTable.getId());
+    sourceTable.setDbName(mTable.getDatabase().getName());
+    sourceTable.setTableName(mTable.getTableName());
+    String transactionalProp = mTable.getParameters().get(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
+    sourceTable.setInsertOnly("insert_only".equalsIgnoreCase(transactionalProp));

Review comment:
       this `equalIgnoreCase` is a bit out of scope here: maybe there is a method which could be used for this?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -278,6 +306,9 @@ private int aggregateStats(Hive db) {
         }
         db.alterTable(tableFullName, res, environmentContext, true);
 
+        TransactionalStatsProcessor transactionalStatsProcessor = new TransactionalStatsProcessor(txnManager, p);
+        transactionalStatsProcessor.process(statsAggregator);

Review comment:
       this call is after the metastore call; meanwhile for partiioned the order is different

##########
File path: standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql
##########
@@ -176,6 +176,10 @@ ALTER TABLE "APP"."DC_PRIVS" ADD CONSTRAINT "DC_PRIVS_PK" PRIMARY KEY ("DC_GRANT
 ALTER TABLE "APP"."DC_PRIVS" ADD CONSTRAINT "DC_PRIVS_FK1" FOREIGN KEY ("NAME") REFERENCES "APP"."DATACONNECTORS" ("NAME") ON DELETE NO ACTION ON UPDATE NO ACTION;
 CREATE UNIQUE INDEX "APP"."DCPRIVILEGEINDEX" ON "APP"."DC_PRIVS" ("AUTHORIZER", "NAME", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", "DC_PRIV", "GRANTOR", "GRANTOR_TYPE");
 
+-- HIVE-25656
+ALTER TABLE "APP"."MV_TABLES_USED" ADD COLUMN "INSERTED_COUNT" BIGINT NOT NULL DEFAULT 0;
+ALTER TABLE "APP"."MV_TABLES_USED" ADD COLUMN "UPDATED_COUNT" BIGINT NOT NULL DEFAULT 0;
+ALTER TABLE "APP"."MV_TABLES_USED" ADD COLUMN "DELETED_COUNT" BIGINT NOT NULL DEFAULT 0;

Review comment:
       primary key seems to be missing

##########
File path: standalone-metastore/metastore-server/src/main/sql/mssql/hive-schema-4.0.0.mssql.sql
##########
@@ -424,7 +424,10 @@ CREATE INDEX MV_UNIQUE_TABLE ON MV_CREATION_METADATA (TBL_NAME,DB_NAME);
 CREATE TABLE MV_TABLES_USED
 (
     MV_CREATION_METADATA_ID bigint NOT NULL,
-    TBL_ID bigint NOT NULL
+    TBL_ID bigint NOT NULL,
+    INSERTED_COUNT bigint NOT NULL DEFAULT 0,
+    UPDATED_COUNT bigint NOT NULL DEFAULT 0,
+    DELETED_COUNT bigint NOT NULL DEFAULT 0

Review comment:
       primary key seems to be missing

##########
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##########
@@ -253,18 +253,33 @@
       <field name="tblName">
         <column name="TBL_NAME" length="256" jdbc-type="VARCHAR"/>
       </field>
-      <field name="tables" table="MV_TABLES_USED">
-        <collection element-type="MTable"/>
-        <join>
-          <column name="MV_CREATION_METADATA_ID"/>
-        </join>
-        <element column="TBL_ID"/>
+      <field name="tables" mapped-by="creationMetadata" dependent-element="true">
+        <collection element-type="MMVSource" dependent-element="true" />
+        <foreign-key name="MV_TABLES_USED_FK1" delete-action="cascade"/>
       </field>
       <field name="txnList">
         <column name="TXN_LIST" length="32672" jdbc-type="VARCHAR" allows-null="true"/>
       </field>
     </class>
 
+    <class name="MMVSource" identity-type="application" table="MV_TABLES_USED" detachable="true" objectid-class="MMVSource$PK">
+      <field name="creationMetadata" primary-key="true">
+        <column name="MV_CREATION_METADATA_ID"/>
+      </field>
+      <field name="table" primary-key="true">
+        <column name="TBL_ID"/>
+      </field>

Review comment:
       I'm wondering about the order in which JDO will create the primary key; I guess it will be `METADATA_ID,TBL_ID` or the other way around?
   but in any case we should declare it in our upgrade sqls
   

##########
File path: standalone-metastore/metastore-server/src/main/sql/oracle/upgrade-3.2.0-to-4.0.0.oracle.sql
##########
@@ -201,6 +201,11 @@ ALTER TABLE COMPLETED_COMPACTIONS ADD CC_INITIATOR_ID varchar(128);
 ALTER TABLE COMPLETED_COMPACTIONS ADD CC_INITIATOR_VERSION varchar(128);
 ALTER TABLE COMPLETED_COMPACTIONS ADD CC_WORKER_VERSION varchar(128);
 
+-- HIVE-25656
+ALTER TABLE TABLES_USED ADD INSERTED_COUNT NUMBER DEFAULT 0 NOT NULL;
+ALTER TABLE TABLES_USED ADD UPDATED_COUNT NUMBER DEFAULT 0 NOT NULL;
+ALTER TABLE TABLES_USED ADD DELETED_COUNT NUMBER DEFAULT 0 NOT NULl;

Review comment:
       primary key seems to be missing
   
   

##########
File path: standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.2.0-to-4.0.0.mssql.sql
##########
@@ -227,6 +227,11 @@ ALTER TABLE DC_PRIVS ADD CONSTRAINT DC_PRIVS_FK1 FOREIGN KEY (NAME) REFERENCES D
 CREATE UNIQUE INDEX DCPRIVILEGEINDEX ON DC_PRIVS (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);
 CREATE INDEX DC_PRIVS_N49 ON DC_PRIVS (NAME);
 
+-- HIVE-25656
+ALTER TABLE "MV_TABLES_USED" ADD "INSERTED_COUNT" BIGINT NOT NULL DEFAULT 0;
+ALTER TABLE "MV_TABLES_USED" ADD "UPDATED_COUNT" BIGINT NOT NULL DEFAULT 0;
+ALTER TABLE "MV_TABLES_USED" ADD "DELETED_COUNT" BIGINT NOT NULL DEFAULT 0;

Review comment:
       primary key seems to be missing

##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -2454,7 +2472,7 @@ service ThriftHiveMetastore extends fb303.FacebookService
   GetTableResult get_table_req(1:GetTableRequest req) throws (1:MetaException o1, 2:NoSuchObjectException o2)
   GetTablesResult get_table_objects_by_name_req(1:GetTablesRequest req)
 				   throws (1:MetaException o1, 2:InvalidOperationException o2, 3:UnknownDBException o3)
-  Materialization get_materialization_invalidation_info(1:CreationMetadata creation_metadata, 2:string validTxnList)
+  Materialization get_materialization_invalidation_info(1:CreationMetadata creation_metadata)

Review comment:
       one more incompatible change...not sure what we want to do with this

##########
File path: standalone-metastore/metastore-server/src/main/sql/oracle/upgrade-3.2.0-to-4.0.0.oracle.sql
##########
@@ -201,6 +201,11 @@ ALTER TABLE COMPLETED_COMPACTIONS ADD CC_INITIATOR_ID varchar(128);
 ALTER TABLE COMPLETED_COMPACTIONS ADD CC_INITIATOR_VERSION varchar(128);
 ALTER TABLE COMPLETED_COMPACTIONS ADD CC_WORKER_VERSION varchar(128);
 
+-- HIVE-25656
+ALTER TABLE TABLES_USED ADD INSERTED_COUNT NUMBER DEFAULT 0 NOT NULL;
+ALTER TABLE TABLES_USED ADD UPDATED_COUNT NUMBER DEFAULT 0 NOT NULL;
+ALTER TABLE TABLES_USED ADD DELETED_COUNT NUMBER DEFAULT 0 NOT NULl;

Review comment:
       have you tested oracle? unfortunately its not covered in precommit at all

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MCreationMetadata.java
##########
@@ -47,12 +75,13 @@ public MCreationMetadata(String catName, String dbName, String tblName,
     this.materializationTime = materializationTime;
   }
 
-  public Set<MTable> getTables() {
+  public Set<MMVSource> getTables() {
     return tables;
   }
 
-  public void setTables(Set<MTable> tables) {
-    this.tables = tables;
+  public void setTables(Set<MMVSource> tables) {
+    this.tables.clear();

Review comment:
       this change aims to make this class independent from the passed collection; I guess there were issues arising from it; but the constructor is still keeping the externally passed reference - if it caused problem maybe an  `ImmutableSet` would be better|?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r749404289



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12193,13 +12194,14 @@ private void walkASTMarkTABREF(TableMask tableMask, ASTNode ast, Set<String> cte
         if (table.isMaterializedView()) {
           // When we are querying a materialized view directly, we check whether the source tables
           // do not apply any policies.
-          for (String qName : table.getCreationMetadata().getTablesUsed()) {
+          for (SourceTable sourceTable : table.getCreationMetadata().getTablesUsed()) {
+            String qualifiedTableName = TableName.getDbTable(sourceTable.getDbName(), sourceTable.getTableName());

Review comment:
       Wrapped the Table object in SourceTable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r749110287



##########
File path: standalone-metastore/metastore-server/src/main/sql/oracle/upgrade-3.2.0-to-4.0.0.oracle.sql
##########
@@ -201,6 +201,11 @@ ALTER TABLE COMPLETED_COMPACTIONS ADD CC_INITIATOR_ID varchar(128);
 ALTER TABLE COMPLETED_COMPACTIONS ADD CC_INITIATOR_VERSION varchar(128);
 ALTER TABLE COMPLETED_COMPACTIONS ADD CC_WORKER_VERSION varchar(128);
 
+-- HIVE-25656
+ALTER TABLE TABLES_USED ADD INSERTED_COUNT NUMBER DEFAULT 0 NOT NULL;
+ALTER TABLE TABLES_USED ADD UPDATED_COUNT NUMBER DEFAULT 0 NOT NULL;
+ALTER TABLE TABLES_USED ADD DELETED_COUNT NUMBER DEFAULT 0 NOT NULl;

Review comment:
       Isn't this test suppose to do that?
   https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/ITestOracle.java




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r749412040



##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1101,6 +1118,7 @@ struct CommitTxnRequest {
     5: optional CommitTxnKeyValue keyValue,
     6: optional bool exclWriteEnabled = true,
     7: optional TxnType txn_type,
+    8: optional set<AffectedRowCount> rowsAffected,

Review comment:
       reverted this 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
scarlin-cloudera commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r748443946



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12847,13 +12849,21 @@ protected void saveViewDefinition() throws SemanticException {
     createVwDesc.setViewExpandedText(expandedText);
   }
 
-  private Set<String> getTablesUsed(ParseContext parseCtx) {
-    Set<String> tablesUsed = new HashSet<>();
+  private Set<SourceTable> getTablesUsed(ParseContext parseCtx) {
+    Set<SourceTable> tablesUsed = new HashSet<>();
     for (TableScanOperator topOp : parseCtx.getTopOps().values()) {
       Table table = topOp.getConf().getTableMetadata();
       if (!table.isMaterializedTable() && !table.isView()) {
         // Add to signature
-        tablesUsed.add(table.getFullyQualifiedName());
+        SourceTable sourceTable = new SourceTable();

Review comment:
       So I know the SourceTable file is a generated thrift file but it seems like we have soooo much code in SemanticAnalyzer and this code seems like it belongs in a different method somewhere.  Some kind of creator method that takes a Table as a parameter.   Maybe somewhere under standalone-metastore in a Factory class?  If we put it in there, we'd also have to pass in a boolean because standalone-metastore should not rely on AcidUtils




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r755099360



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java
##########
@@ -67,7 +67,7 @@ public void initialize(QueryState queryState, QueryPlan queryPlan, TaskQueue tas
     super.initialize(queryState, queryPlan, taskQueue, context);
 
     if (work.getBasicStatsWork() != null) {
-      BasicStatsTask task = new BasicStatsTask(conf, work.getBasicStatsWork());
+      BasicStatsTask task = new BasicStatsTask(conf, work.getBasicStatsWork(), getTxnMgr());

Review comment:
       do we still need this argument?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r749112764



##########
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##########
@@ -253,18 +253,33 @@
       <field name="tblName">
         <column name="TBL_NAME" length="256" jdbc-type="VARCHAR"/>
       </field>
-      <field name="tables" table="MV_TABLES_USED">
-        <collection element-type="MTable"/>
-        <join>
-          <column name="MV_CREATION_METADATA_ID"/>
-        </join>
-        <element column="TBL_ID"/>
+      <field name="tables" mapped-by="creationMetadata" dependent-element="true">
+        <collection element-type="MMVSource" dependent-element="true" />
+        <foreign-key name="MV_TABLES_USED_FK1" delete-action="cascade"/>
       </field>
       <field name="txnList">
         <column name="TXN_LIST" length="32672" jdbc-type="VARCHAR" allows-null="true"/>
       </field>
     </class>
 
+    <class name="MMVSource" identity-type="application" table="MV_TABLES_USED" detachable="true" objectid-class="MMVSource$PK">
+      <field name="creationMetadata" primary-key="true">
+        <column name="MV_CREATION_METADATA_ID"/>
+      </field>
+      <field name="table" primary-key="true">
+        <column name="TBL_ID"/>
+      </field>

Review comment:
       Added pk definitions to schema and upgrade sqls.

##########
File path: standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql
##########
@@ -176,6 +176,10 @@ ALTER TABLE "APP"."DC_PRIVS" ADD CONSTRAINT "DC_PRIVS_PK" PRIMARY KEY ("DC_GRANT
 ALTER TABLE "APP"."DC_PRIVS" ADD CONSTRAINT "DC_PRIVS_FK1" FOREIGN KEY ("NAME") REFERENCES "APP"."DATACONNECTORS" ("NAME") ON DELETE NO ACTION ON UPDATE NO ACTION;
 CREATE UNIQUE INDEX "APP"."DCPRIVILEGEINDEX" ON "APP"."DC_PRIVS" ("AUTHORIZER", "NAME", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", "DC_PRIV", "GRANTOR", "GRANTOR_TYPE");
 
+-- HIVE-25656
+ALTER TABLE "APP"."MV_TABLES_USED" ADD COLUMN "INSERTED_COUNT" BIGINT NOT NULL DEFAULT 0;
+ALTER TABLE "APP"."MV_TABLES_USED" ADD COLUMN "UPDATED_COUNT" BIGINT NOT NULL DEFAULT 0;
+ALTER TABLE "APP"."MV_TABLES_USED" ADD COLUMN "DELETED_COUNT" BIGINT NOT NULL DEFAULT 0;

Review comment:
       Added pk definitions to schema and upgrade sqls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r751148742



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12847,13 +12849,21 @@ protected void saveViewDefinition() throws SemanticException {
     createVwDesc.setViewExpandedText(expandedText);
   }
 
-  private Set<String> getTablesUsed(ParseContext parseCtx) {
-    Set<String> tablesUsed = new HashSet<>();
+  private Set<SourceTable> getTablesUsed(ParseContext parseCtx) {
+    Set<SourceTable> tablesUsed = new HashSet<>();
     for (TableScanOperator topOp : parseCtx.getTopOps().values()) {
       Table table = topOp.getConf().getTableMetadata();
       if (!table.isMaterializedTable() && !table.isView()) {
         // Add to signature
-        tablesUsed.add(table.getFullyQualifiedName());
+        SourceTable sourceTable = new SourceTable();

Review comment:
       Refactored how `SourceTable` instances are created and in some cases like the one you mentioned `TableName` is a better choice because the stats contained by `SourceTable` is not needed here.
   Added factory methods to `ql.metadata.Table`
   * `SourceTable createSourceTable()` - to create a `SourceTable` instance in an initial state referencing this table instance.
   * `TableName getFullTableName()` - to create a `TableName` using this table instance.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r749340621



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -235,6 +223,46 @@ private void updateStats(StatsAggregator statsAggregator, Map<String, String> pa
 
   }
 
+  private static class TransactionalStatsProcessor {
+    private final HiveTxnManager txnManager;
+    private final Partish partish;
+
+    private TransactionalStatsProcessor(HiveTxnManager txnManager, Partish partish) {
+      this.txnManager = txnManager;
+      this.partish = partish;
+    }
+
+    private long toLong(String value) {
+      if (value == null || value.isEmpty()) {
+        return 0;
+      }
+
+      return Long.parseLong(value);
+    }
+
+    public void process(StatsAggregator statsAggregator) throws HiveException, MetaException {
+      if (statsAggregator == null) {
+        return;
+      }
+
+      if (partish.isTransactionalTable()) {
+        String prefix = getAggregationPrefix(partish.getTable(), partish.getPartition());
+        long insertCount = toLong(statsAggregator.aggregateStats(prefix, INSERT_COUNT));
+        long updateCount = toLong(statsAggregator.aggregateStats(prefix, UPDATE_COUNT));
+        long deleteCount = toLong(statsAggregator.aggregateStats(prefix, DELETE_COUNT));
+
+        if (insertCount > 0 || updateCount > 0 || deleteCount > 0) {
+          AffectedRowCount affectedRowCount = new AffectedRowCount();
+          affectedRowCount.setTableId(partish.getTable().getTTable().getId());
+          affectedRowCount.setInsertCount(insertCount);
+          affectedRowCount.setUpdatedCount(updateCount);
+          affectedRowCount.setDeletedCount(deleteCount);
+
+          txnManager.addAffectedRowCount(affectedRowCount);

Review comment:
       I changed this part to save the stats from `BasicStatsTask.java` by introducing a new thrift api method: 
   ```
   void update_transaction_statistics(1:UpdateTransactionalStatsRequest req)
   ```
   However instead of passing all the delta in one call to HMS and updating the stats in the backend db by one call per table it will cause several HMS and backend DB calls. Example: a merge statement performs 3+3 calls.
   Maybe we should optimize this later.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz merged pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz merged pull request #2756:
URL: https://github.com/apache/hive/pull/2756


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r749070911



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##########
@@ -2537,6 +2544,20 @@ private CreationMetadata convertToCreationMetadata(MCreationMetadata s) {
     return r;
   }
 
+  private SourceTable convertToSourceTable(MMVSource mmvSource) {
+    SourceTable sourceTable = new SourceTable();
+    MTable mTable = mmvSource.getTable();
+    sourceTable.setTableId(mTable.getId());
+    sourceTable.setDbName(mTable.getDatabase().getName());
+    sourceTable.setTableName(mTable.getTableName());
+    String transactionalProp = mTable.getParameters().get(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
+    sourceTable.setInsertOnly("insert_only".equalsIgnoreCase(transactionalProp));

Review comment:
       There is `org.apache.hadoop.hive.ql.io.AcidUtils.isInsertOnlyTable` but that method is not accessible here because `ql` is not referenced from `metastore-server`.
   Should it be moved to the `common` module?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r755189042



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java
##########
@@ -67,7 +67,7 @@ public void initialize(QueryState queryState, QueryPlan queryPlan, TaskQueue tas
     super.initialize(queryState, queryPlan, taskQueue, context);
 
     if (work.getBasicStatsWork() != null) {
-      BasicStatsTask task = new BasicStatsTask(conf, work.getBasicStatsWork());
+      BasicStatsTask task = new BasicStatsTask(conf, work.getBasicStatsWork(), getTxnMgr());

Review comment:
       No. Removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r751191780



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12847,13 +12848,12 @@ protected void saveViewDefinition() throws SemanticException {
     createVwDesc.setViewExpandedText(expandedText);
   }
 
-  private Set<String> getTablesUsed(ParseContext parseCtx) {
-    Set<String> tablesUsed = new HashSet<>();
+  private Set<TableName> getTablesUsed(ParseContext parseCtx) {

Review comment:
       Try to move this method to `ParseContext`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r749112966



##########
File path: standalone-metastore/metastore-server/src/main/sql/mssql/hive-schema-4.0.0.mssql.sql
##########
@@ -424,7 +424,10 @@ CREATE INDEX MV_UNIQUE_TABLE ON MV_CREATION_METADATA (TBL_NAME,DB_NAME);
 CREATE TABLE MV_TABLES_USED
 (
     MV_CREATION_METADATA_ID bigint NOT NULL,
-    TBL_ID bigint NOT NULL
+    TBL_ID bigint NOT NULL,
+    INSERTED_COUNT bigint NOT NULL DEFAULT 0,
+    UPDATED_COUNT bigint NOT NULL DEFAULT 0,
+    DELETED_COUNT bigint NOT NULL DEFAULT 0

Review comment:
       Added pk definitions to schema and upgrade sqls.

##########
File path: standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.2.0-to-4.0.0.mssql.sql
##########
@@ -227,6 +227,11 @@ ALTER TABLE DC_PRIVS ADD CONSTRAINT DC_PRIVS_FK1 FOREIGN KEY (NAME) REFERENCES D
 CREATE UNIQUE INDEX DCPRIVILEGEINDEX ON DC_PRIVS (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);
 CREATE INDEX DC_PRIVS_N49 ON DC_PRIVS (NAME);
 
+-- HIVE-25656
+ALTER TABLE "MV_TABLES_USED" ADD "INSERTED_COUNT" BIGINT NOT NULL DEFAULT 0;
+ALTER TABLE "MV_TABLES_USED" ADD "UPDATED_COUNT" BIGINT NOT NULL DEFAULT 0;
+ALTER TABLE "MV_TABLES_USED" ADD "DELETED_COUNT" BIGINT NOT NULL DEFAULT 0;

Review comment:
       Added pk definitions to schema and upgrade sqls.

##########
File path: standalone-metastore/metastore-server/src/main/sql/oracle/upgrade-3.2.0-to-4.0.0.oracle.sql
##########
@@ -201,6 +201,11 @@ ALTER TABLE COMPLETED_COMPACTIONS ADD CC_INITIATOR_ID varchar(128);
 ALTER TABLE COMPLETED_COMPACTIONS ADD CC_INITIATOR_VERSION varchar(128);
 ALTER TABLE COMPLETED_COMPACTIONS ADD CC_WORKER_VERSION varchar(128);
 
+-- HIVE-25656
+ALTER TABLE TABLES_USED ADD INSERTED_COUNT NUMBER DEFAULT 0 NOT NULL;
+ALTER TABLE TABLES_USED ADD UPDATED_COUNT NUMBER DEFAULT 0 NOT NULL;
+ALTER TABLE TABLES_USED ADD DELETED_COUNT NUMBER DEFAULT 0 NOT NULl;

Review comment:
       Added pk definitions to schema and upgrade sqls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #2756: HIVE-25656: Get materialized view state based on number of affected rows of transactions

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #2756:
URL: https://github.com/apache/hive/pull/2756#discussion_r749111641



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -278,6 +306,9 @@ private int aggregateStats(Hive db) {
         }
         db.alterTable(tableFullName, res, environmentContext, true);
 
+        TransactionalStatsProcessor transactionalStatsProcessor = new TransactionalStatsProcessor(txnManager, p);
+        transactionalStatsProcessor.process(statsAggregator);

Review comment:
       moved the partitioned one




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org