You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/12/29 02:05:00 UTC

[jira] [Updated] (HIVE-26893) Extend batch partitions APIs to ignore partition schemas

     [ https://issues.apache.org/jira/browse/HIVE-26893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated HIVE-26893:
----------------------------------
    Summary: Extend batch partitions APIs to ignore partition schemas  (was: Extend get partitions APIs to ignore partition schemas)

> Extend batch partitions APIs to ignore partition schemas
> --------------------------------------------------------
>
>                 Key: HIVE-26893
>                 URL: https://issues.apache.org/jira/browse/HIVE-26893
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>            Reporter: Quanlong Huang
>            Priority: Major
>
> There are several HMS APIs that return a list of partitions, e.g. get_partitions_ps(), get_partitions_by_names(), add_partitions_req() with needResult=true, etc. Each partition instance will have a unique list of FieldSchemas as the partition schema:
> {code:java}
> org.apache.hadoop.hive.metastore.api.Partition
> -> org.apache.hadoop.hive.metastore.api.StorageDescriptor
>    ->  cols: list<org.apache.hadoop.hive.metastore.api.FieldSchema> {code}
> This could occupy a large memory footprint for wide tables (e.g. with 2k cols). See the heap histogram in IMPALA-11812 as an example.
> Some engines like Impala doesn't actually use/respect the partition level schema. It's a waste of network/serde resource to transmit them. It'd be nice if these APIs provide an optional boolean flag for ignoring partition schemas. So HMS clients (e.g. Impala) don't need to clear them later (to save mem).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)