You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Juliusz Sompolski (JIRA)" <ji...@apache.org> on 2019/05/31 14:42:00 UTC

[jira] [Created] (SPARK-27899) Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

Juliusz Sompolski created SPARK-27899:
-----------------------------------------

             Summary: Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API
                 Key: SPARK-27899
                 URL: https://issues.apache.org/jira/browse/SPARK-27899
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Juliusz Sompolski


The new Spark ThriftServer SparkGetTablesOperation implemented in https://github.com/apache/spark/pull/22794 does a catalog.getTableMetadata request for every table. This can get very slow for large schemas (~50ms per table with an external Hive metastore).
Hive ThriftServer GetTablesOperation uses HiveMetastoreClient.getTableObjectsByName to get table information in bulk, but we don't expose that through our APIs that go through Hive -> HiveClientImpl (HiveClient) -> HiveExternalCatalog (ExternalCatalog) -> SessionCatalog.

If we added and exposed getTableObjectsByName through our catalog APIs, we could resolve that performance problem in SparkGetTablesOperation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org