You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Li Sheng (JIRA)" <ji...@apache.org> on 2015/06/10 10:16:00 UTC

[jira] [Created] (SPARK-8287) Filter not push down through Subquery or View

Li Sheng created SPARK-8287:
-------------------------------

             Summary: Filter not push down through Subquery or View
                 Key: SPARK-8287
                 URL: https://issues.apache.org/jira/browse/SPARK-8287
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.1
            Reporter: Li Sheng
             Fix For: 1.4.0


Filter not push down through Subquery or View. Assume we have two big  partitioned table join inner a Subquery or a View and filter not push down, this will cause a full partition join and will cause performance issues.

Let me give and example that can reproduce the problem:

create table src(key int, value string);

--创建分区表并且导入数据
CREATE TABLE src_partitioned1 (key int, value STRING) PARTITIONED BY (ds STRING);

insert overwrite table src_partitioned1 PARTITION (ds='1') select key, value from src;
insert overwrite table src_partitioned1 PARTITION (ds='2') select key, value from src;

CREATE TABLE src_partitioned2 (key int, value STRING) PARTITIONED BY (ds STRING);

insert overwrite table src_partitioned2 PARTITION (ds='1') select key, value from src;
insert overwrite table src_partitioned2 PARTITION (ds='2') select key, value from src;

--创建视图
create view src_view as select sum(a.key) s1, sum(b.key) s2, b.ds ds from src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds


create view src_view_1 as select sum(a.key) s1, sum(b.key) s2, b.ds my_ds from src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds


--QueryExecution
select * from dw.src_view where ds='2'


sql("select * from dw.src_view where ds='2' ").queryExecution

== Parsed Logical Plan ==
'Project [*]
 'Filter ('ds = 2)
  'UnresolvedRelation [dw,src_view], None

== Analyzed Logical Plan ==
Project [s1#60L,s2#61L,ds#62]
 Filter (ds#62 = 2)
  Subquery src_view
   Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67, LongType)) AS s2#61L,ds#66 AS ds#62]
    Join Inner, Some((ds#63 = ds#66))
     MetastoreRelation dw, src_partitioned1, Some(a)
     MetastoreRelation dw, src_partitioned2, Some(b)

== Optimized Logical Plan ==
Filter (ds#62 = 2)
 Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67, LongType)) AS s2#61L,ds#66 AS ds#62]
  Project [ds#66,key#64,key#67]
   Join Inner, Some((ds#63 = ds#66))
    Project [key#64,ds#63]
     MetastoreRelation dw, s...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org