You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Anishek Agarwal (Jira)" <ji...@apache.org> on 2021/01/18 04:31:00 UTC

[jira] [Commented] (HIVE-24649) Optimise Hive::addWriteNotificationLog for large data inserts

    [ https://issues.apache.org/jira/browse/HIVE-24649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267006#comment-17267006 ] 

Anishek Agarwal commented on HIVE-24649:
----------------------------------------

the partitions add additional data like catalog name etc for now in HMS call and not present on HS2 side , may be other things internally be added later ( which is difficult to predict ) ,  at most i think we can probably prevent reloading of the table object, this is also across the HS2 and HMS boundary, better would be if caching of metadata is enabled on HMS that way round trip to rdbms would be small. another possible way is return list of partitions from {{addPartitionsToMetastore}} which is lot of network roundtrip to and from HMS to HS2 and back to HMS in addWriteNotificationLog.

cc [~aasha]/[~pkumarsinha]

> Optimise Hive::addWriteNotificationLog for large data inserts
> -------------------------------------------------------------
>
>                 Key: HIVE-24649
>                 URL: https://issues.apache.org/jira/browse/HIVE-24649
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Rajesh Balamohan
>            Priority: Major
>              Labels: performance
>
> When loading dynamic partition with large dataset, it spends lot of time in "Hive::loadDynamicPartitions --> addWriteNotificationLog".
> Though it is for same for same table, it ends up loading table and partition details for every partition and writes to notification log.
> Also, "Partition" details may be already present in {{PartitionDetails}} object in {{Hive::loadDynamicPartitions}}. This is unnecessarily recomputed again in {{HiveMetaStore::add_write_notification_log}}
>  
> Lines of interest:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3028
> https://github.com/apache/hive/blob/89073a94354f0cc14ec4ae0a43e05aae29276b4d/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L8500
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)