You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Saif Addin (JIRA)" <ji...@apache.org> on 2017/07/02 19:19:04 UTC
[jira] [Commented] (SPARK-21198) SparkSession catalog is terribly
slow
[ https://issues.apache.org/jira/browse/SPARK-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071759#comment-16071759 ]
Saif Addin commented on SPARK-21198:
------------------------------------
[~viirya] any chance you could take a look?
> SparkSession catalog is terribly slow
> -------------------------------------
>
> Key: SPARK-21198
> URL: https://issues.apache.org/jira/browse/SPARK-21198
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.0
> Reporter: Saif Addin
>
> We have a considerably large Hive metastore and a Spark program that goes through Hive data availability.
> In spark 1.x, we were using sqlConext.tableNames, sqlContext.sql() and sqlContext.isCached() to go throgh Hive metastore information.
> Once migrated to spark 2.x we switched over SparkSession.catalog instead, but it turns out that both listDatabases() and listTables() take between 5 to 20 minutes depending on the database to return results, using operations such as the following one:
> spark.catalog.listTables(db).filter(__.isTemporary).map(__.name).collect
> and made the program unbearably slow to return a list of tables.
> I know we still have spark.sqlContext.tableNames as workaround but I am assuming this is going to be deprecated anytime soon?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org