You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2015/06/11 23:17:00 UTC

[jira] [Updated] (SPARK-8287) Filters not pushed with substitution through aggregation

     [ https://issues.apache.org/jira/browse/SPARK-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Armbrust updated SPARK-8287:
------------------------------------
    Summary: Filters not pushed with substitution through aggregation  (was: Filter not push down through Subquery or View)

> Filters not pushed with substitution through aggregation
> --------------------------------------------------------
>
>                 Key: SPARK-8287
>                 URL: https://issues.apache.org/jira/browse/SPARK-8287
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.1
>            Reporter: Li Sheng
>             Fix For: 1.4.0
>
>   Original Estimate: 40h
>  Remaining Estimate: 40h
>
> Filter not push down through Subquery or View. Assume we have two big  partitioned table join inner a Subquery or a View and filter not push down, this will cause a full partition join and will cause performance issues.
> Let me give and example that can reproduce the problem:
> {code:sql}
> create table src(key int, value string);
> -- Creates partitioned table and imports data
> CREATE TABLE src_partitioned1 (key int, value STRING) PARTITIONED BY (ds STRING);
> insert overwrite table src_partitioned1 PARTITION (ds='1') select key, value from src;
> insert overwrite table src_partitioned1 PARTITION (ds='2') select key, value from src;
> CREATE TABLE src_partitioned2 (key int, value STRING) PARTITIONED BY (ds STRING);
> insert overwrite table src_partitioned2 PARTITION (ds='1') select key, value from src;
> insert overwrite table src_partitioned2 PARTITION (ds='2') select key, value from src;
> -- Creates views
> create view src_view as select sum(a.key) s1, sum(b.key) s2, b.ds ds from src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds
> create view src_view_1 as select sum(a.key) s1, sum(b.key) s2, b.ds my_ds from src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds
> -- QueryExecution
> select * from dw.src_view where ds='2'
> {code}
> {noformat}
> sql("select * from dw.src_view where ds='2' ").queryExecution
> == Parsed Logical Plan ==
> 'Project [*]
>  'Filter ('ds = 2)
>   'UnresolvedRelation [dw,src_view], None
> == Analyzed Logical Plan ==
> Project [s1#60L,s2#61L,ds#62]
>  Filter (ds#62 = 2)
>   Subquery src_view
>    Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67, LongType)) AS s2#61L,ds#66 AS ds#62]
>     Join Inner, Some((ds#63 = ds#66))
>      MetastoreRelation dw, src_partitioned1, Some(a)
>      MetastoreRelation dw, src_partitioned2, Some(b)
> == Optimized Logical Plan ==
> Filter (ds#62 = 2)
>  Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67, LongType)) AS s2#61L,ds#66 AS ds#62]
>   Project [ds#66,key#64,key#67]
>    Join Inner, Some((ds#63 = ds#66))
>     Project [key#64,ds#63]
>      MetastoreRelation dw, s...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org