You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "qian, chen (JIRA)" <ji...@apache.org> on 2015/10/14 11:04:05 UTC

[jira] [Comment Edited] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

    [ https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956508#comment-14956508 ] 

qian, chen edited comment on SPARK-6910 at 10/14/15 9:03 AM:
-------------------------------------------------------------

I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very interesting thing:
in spark-sql shell:
at first I ran this, it took about 3 minutes
select * from table1 where date='20151010' and hour='12' and name='x' limit 5;
Time taken: 164.502 seconds

then I ran this, it only took 10s. date, hour and name are partition columns in this hive table. this table has >4000 partitions
select * from table1 where date='20151010' and hour='13' limit 5;
Time taken: 10.881 seconds
is it because that the first time I need to download all partition information from hive metastore? the second query is faster because all partitions are cached in memory now?
any suggestions about speeding up the first query?


was (Author: nedqian):
I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very interesting thing:
in spark-sql shell:
at first I ran this, it took about 3 minutes
select * from table1 where date='20151010' and hour='12' and name='x' limit 5;
Time taken: 164.502 seconds

then I ran this, it only took 10s. date, hour and name are partition columns in this hive table. this table has >4000 partitions
select * from table1 where date='20151010' and hour='13' limit 5;
Time taken: 10.881 seconds
is it because that the first time I need to download all partition information from hive metastore? the second query is faster because all partitions are cached in memory now?

> Support for pushing predicates down to metastore for partition pruning
> ----------------------------------------------------------------------
>
>                 Key: SPARK-6910
>                 URL: https://issues.apache.org/jira/browse/SPARK-6910
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Cheolsoo Park
>            Priority: Critical
>             Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org