You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Saif Addin (JIRA)" <ji...@apache.org> on 2017/06/23 22:59:00 UTC

[jira] [Created] (SPARK-21198) SparkSession catalog is terribly slow

Saif Addin created SPARK-21198:
----------------------------------

             Summary: SparkSession catalog is terribly slow
                 Key: SPARK-21198
                 URL: https://issues.apache.org/jira/browse/SPARK-21198
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.1.0
            Reporter: Saif Addin


We have a considerably large Hive metastore and a Spark program that goes through Hive data availability.

In spark 1.x, we were using sqlConext.tableNames or sqlContext.sql() to go throgh Hive.
Once migrated to spark 2.x we switched over SparkSession.catalog instead, but it turns out that both listDatabases() and listTables() take between 5 to 20 minutes depending on the database to return results, using operations such as the following one:

spark.catalog.listTables(db).filter(_.isTemporary).map(_.name).collect

and made the program unbearably to return a list of tables.

I know we still have spark.sqlContext.tableNames as workaround but I am assuming this is going to be deprecated anytime soon?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org