You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2015/07/28 07:38:05 UTC

[jira] [Resolved] (SPARK-6984) Operations on tables with many partitions _very_slow

     [ https://issues.apache.org/jira/browse/SPARK-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Armbrust resolved SPARK-6984.
-------------------------------------
       Resolution: Duplicate
    Fix Version/s: 1.5.0

This should be fixed by [SPARK-6910].

> Operations on tables with many partitions _very_slow
> ----------------------------------------------------
>
>                 Key: SPARK-6984
>                 URL: https://issues.apache.org/jira/browse/SPARK-6984
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.1
>         Environment: External Hive metastore, table with 30K partitions
>            Reporter: Yana Kadiyska
>             Fix For: 1.5.0
>
>         Attachments: 7282_partitions_stack.png
>
>
> I have a table with _many_partitions (30K). Users cannot query all of them but they are in the metastore. Querying this table is extremely slow even if we're asking for a single partition. 
> "describe sometable" also performs _very_ poorly
> {quote}
> Spark produces the following times:
> Query 1 of 1, Rows read: 50, Elapsed time (seconds) - Total: 73.02, SQL query: 72.831, Reading results: 0.189
> Whereas Hive over the same metastore shows:
> Query 1 of 1, Rows read: 47, Elapsed time (seconds) - Total: 0.44, SQL query: 0.204, Reading results: 0.236
> {quote}
> I attempted to debug this and noticed that HiveMetastoreCatalog constructs an object for each partition, which is puzzling to me (attaching screenshot). Should this value be lazy -- describe table should be purely a metastore op IMO (i.e. query postgres, return types).
> The issue is a blocker to me but leaving with default priority until someone can confirm it is a bug. "describe table" is not so interesting but I think this affects all query paths -- I sent an inquiry earlier here: https://www.mail-archive.com/user@spark.apache.org/msg26242.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org