You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2015/07/28 07:38:05 UTC
[jira] [Resolved] (SPARK-6984) Operations on tables with many
partitions _very_slow
[ https://issues.apache.org/jira/browse/SPARK-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust resolved SPARK-6984.
-------------------------------------
Resolution: Duplicate
Fix Version/s: 1.5.0
This should be fixed by [SPARK-6910].
> Operations on tables with many partitions _very_slow
> ----------------------------------------------------
>
> Key: SPARK-6984
> URL: https://issues.apache.org/jira/browse/SPARK-6984
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.2.1
> Environment: External Hive metastore, table with 30K partitions
> Reporter: Yana Kadiyska
> Fix For: 1.5.0
>
> Attachments: 7282_partitions_stack.png
>
>
> I have a table with _many_partitions (30K). Users cannot query all of them but they are in the metastore. Querying this table is extremely slow even if we're asking for a single partition.
> "describe sometable" also performs _very_ poorly
> {quote}
> Spark produces the following times:
> Query 1 of 1, Rows read: 50, Elapsed time (seconds) - Total: 73.02, SQL query: 72.831, Reading results: 0.189
> Whereas Hive over the same metastore shows:
> Query 1 of 1, Rows read: 47, Elapsed time (seconds) - Total: 0.44, SQL query: 0.204, Reading results: 0.236
> {quote}
> I attempted to debug this and noticed that HiveMetastoreCatalog constructs an object for each partition, which is puzzling to me (attaching screenshot). Should this value be lazy -- describe table should be purely a metastore op IMO (i.e. query postgres, return types).
> The issue is a blocker to me but leaving with default priority until someone can confirm it is a bug. "describe table" is not so interesting but I think this affects all query paths -- I sent an inquiry earlier here: https://www.mail-archive.com/user@spark.apache.org/msg26242.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org