You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/04/08 12:02:00 UTC

[jira] [Commented] (IMPALA-10613) Expose table and partition metadata over HMS API

    [ https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317127#comment-17317127 ] 

ASF subversion and git services commented on IMPALA-10613:
----------------------------------------------------------

Commit a7eae471b84f05816780093938bba50f4d78aef1 in impala's branch refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a7eae47 ]

IMPALA-10613: Standup HMS thrift server in Catalog

This change adds the basic infrastructure to start the HMS server in
Catalog. It introduces a new configuration (--start_hms_server) along with a
config for the port and starts a HMS thrift server in the CatalogServiceCatalog
instance. Currently, all the HMS APIs are "pass-through" to the backing HMS
service. Except for the following 3 HMS APIs which can be used to request
a table and its partitions.

Additionally, there is another flag (--enable_catalogd_hms_cache) which
can be used to disable the usage of catalogd for providing the table
and partition metadata. This contribution was done by Kishen Das.

1. get_table_req
2. get_partitions_by_expr
3. get_partitions_by_names

In case of get_partitions_by_expr we need the hive-exec jar to be
present in the classpath since it needs to load the PartitionExpressionProxy
to push down the partition predicates to the HMS database. In case of
get_table_req if column statistics are requested, we return the
table level statistics.

Additionally, this patch adds a new configuration
fallback_to_hms_on_errors for the catalog which is used to determine
if the Catalog falls back to HMS service in case of errors while
executing the API. This is useful for testing purposes.

In order to expose the file-metadata for the tables and partitions,
HMS API changes were made to add the filemetadata fields to table
and partitions. In case of transactional tables, the file-metadata
which is returned is consistent with the provided ValidWriteIdList
in the API call.

There are a few TODOs which will be done in follow up tasks:
1. Add support for SASL support.
2. Pin the hive_metastore.thrift in the code so that any changes to HMS APIs
in the hive branch doesn't break Catalog's HMS service.

Testing:
1. Added a new end-to-end test which starts the HMS service in Catalog and runs
some basic HMS APIs against it.
2. Ran a modification of TestRemoteHiveMetastore in the Hive code base and
confirmed most tests are working. There were some test failures but they are
unrelated since the test assumes an empty warehouse whereas we run against the
actual HMS service running in the mini-cluster.

Change-Id: I1b306f91d63cb5137c178e8e72b6e8b578a907b5
Reviewed-on: http://gerrit.cloudera.org:8080/17244
Reviewed-by: Quanlong Huang <hu...@gmail.com>
Tested-by: Vihang Karajgaonkar <vi...@cloudera.com>


> Expose table and partition metadata over HMS API
> ------------------------------------------------
>
>                 Key: IMPALA-10613
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10613
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>
> Catalogd caches the table and partition metadata. If an external FE needs to be supported to query using the Impala, it would need to get this metadata from catalogd to compile the query and generate the plan. While a subset of the metadata which is cached in catalogd, is sourced from Hive metastore, it also caches file metadata which is needed by the Impala backend to create the Impala plan. It would be good to expose the table and partition metadata cached in catalogd over HMS API so that any Hive metastore client (e.g spark, hive) can potentially use this metadata to create a plan. This JIRA tracks the work needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org