You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2018/05/25 18:36:00 UTC

[jira] [Created] (HIVE-19715) Consolidated and flexible API for fetching partition metadata from HMS

Todd Lipcon created HIVE-19715:
----------------------------------

             Summary: Consolidated and flexible API for fetching partition metadata from HMS
                 Key: HIVE-19715
                 URL: https://issues.apache.org/jira/browse/HIVE-19715
             Project: Hive
          Issue Type: New Feature
          Components: Standalone Metastore
            Reporter: Todd Lipcon


Currently, the HMS thrift API exposes 17 different APIs for fetching partition-related information. There is somewhat of a combinatorial explosion going on, where each API has variants with and without "auth" info, by pspecs vs names, by filters, by exprs, etc. Having all of these separate APIs long term is a maintenance burden and also more confusing for consumers.

Additionally, even with all of these APIs, there is a lack of granularity in fetching only the information needed for a particular use case. For example, in some use cases it may be beneficial to only fetch the partition locations without wasting effort fetching statistics, etc.

This JIRA proposes that we add a new "one API to rule them all" for fetching partition info. The request and response would be encapsulated in structs. Some desirable properties:
- the request should be able to specify which pieces of information are required (eg location, properties, etc)
- in the case of partition parameters, the request should be able to do either whitelisting or blacklisting (eg to exclude large incremental column stats HLL dumped in there by Impala)
- the request should optionally specify auth info (to encompas the "with_auth" variants)
- the request should be able to designate the set of partitions to access through one of several different methods (eg "all", list<name>, expr, part_vals, etc) 
- the struct should be easily evolvable so that new pieces of info can be added
- the response should be designed in such a way as to avoid transferring redundant information for common cases (eg simple "dictionary coding" of strings like parameter names, etc)
- the API should support some form of pagination for tables with large partition counts




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)