You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/12/10 10:05:00 UTC

[jira] [Work logged] (HIVE-24776) Reduce HMS DB calls during stats updates

     [ https://issues.apache.org/jira/browse/HIVE-24776?focusedWorklogId=693823&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-693823 ]

ASF GitHub Bot logged work on HIVE-24776:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Dec/21 10:04
            Start Date: 10/Dec/21 10:04
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #2636:
URL: https://github.com/apache/hive/pull/2636#discussion_r766527736



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
##########
@@ -346,11 +346,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String catName, String dbnam
             }
           }
           Deadline.checkTimeout();
+          Table table = msdb.getTable(catName, newDbName, newTblName);
           for (Entry<Partition, ColumnStatistics> partColStats : columnStatsNeedUpdated.entries()) {
             ColumnStatistics newPartColStats = partColStats.getValue();
             newPartColStats.getStatsDesc().setDbName(newDbName);
             newPartColStats.getStatsDesc().setTableName(newTblName);
-            msdb.updatePartitionColumnStatistics(newPartColStats, partColStats.getKey().getValues(),
+            msdb.updatePartitionColumnStatistics(table, newPartColStats, partColStats.getKey().getValues(),

Review comment:
       looking at the above code - I'm wondering why we need this at all; I believe `alterPartitions` clears the stat data - but I've not seen it explicitly - and this added logic here adds it back after that was done
   
   * old values are really removed ?
   * can't we simply retain the old stat values - because at the end of the day that's what happens here...or I've missed something? - doing this would reduce the number of calls drastically; since we would simply retain things
   
   It also seems like the `alterPartitions` / `alterTable` is also killing the basic stat state - after a rename I don't think we must do that....
   

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##########
@@ -9687,31 +9687,35 @@ private void writeMPartitionColumnStatistics(Table table, Partition partition,
       List<ColumnStatisticsObj> statsObjs = colStats.getStatsObj();
       ColumnStatisticsDesc statsDesc = colStats.getStatsDesc();
       String catName = statsDesc.isSetCatName() ? statsDesc.getCatName() : getDefaultCatalog(conf);
-      MTable mTable = ensureGetMTable(catName, statsDesc.getDbName(), statsDesc.getTableName());
-      Table table = convertToTable(mTable);
-      Partition partition = convertToPart(getMPartition(
-          catName, statsDesc.getDbName(), statsDesc.getTableName(), partVals, mTable), false);
-      List<String> colNames = new ArrayList<>();
+      MTable mTable = null;
+      if(table == null) {

Review comment:
       let's not play with `null` -s and alternate code paths
   you decided to change the method signature and added `table` ; fill it out everywhere or remove the parameter.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 693823)
    Time Spent: 40m  (was: 0.5h)

> Reduce HMS DB calls during stats updates
> ----------------------------------------
>
>                 Key: HIVE-24776
>                 URL: https://issues.apache.org/jira/browse/HIVE-24776
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Harshit Gupta
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
>  When adding large number of partitions (100s/1000s) in a table, it ends up making lots of getTable calls which are not needed.
> Lines mentioned below may vary slightly in apache-master. 
> {noformat}
> 	at org.datanucleus.api.jdo.JDOPersistenceManager.jdoRetrieve(JDOPersistenceManager.java:620)
> 	at org.datanucleus.api.jdo.JDOPersistenceManager.retrieve(JDOPersistenceManager.java:637)
> 	at org.datanucleus.api.jdo.JDOPersistenceManager.retrieve(JDOPersistenceManager.java:646)
> 	at org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2112)
> 	at org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2150)
> 	at org.apache.hadoop.hive.metastore.ObjectStore.ensureGetMTable(ObjectStore.java:4578)
> 	at org.apache.hadoop.hive.metastore.ObjectStore.ensureGetTable(ObjectStore.java:4588)
> 	at org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:9264)
> 	at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
> 	at com.sun.proxy.$Proxy27.updatePartitionColumnStatistics(Unknown Source)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartitonColStatsInternal(HiveMetaStore.java:6679)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:8655)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:8592)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
> 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
> 	at com.sun.proxy.$Proxy28.set_aggr_stats_for(Unknown Source)
> 	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:19060)
> 	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:19044)
> 	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> 	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)