You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Li Sheng (JIRA)" <ji...@apache.org> on 2015/06/10 10:16:00 UTC
[jira] [Created] (SPARK-8287) Filter not push down through Subquery
or View
Li Sheng created SPARK-8287:
-------------------------------
Summary: Filter not push down through Subquery or View
Key: SPARK-8287
URL: https://issues.apache.org/jira/browse/SPARK-8287
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.3.1
Reporter: Li Sheng
Fix For: 1.4.0
Filter not push down through Subquery or View. Assume we have two big partitioned table join inner a Subquery or a View and filter not push down, this will cause a full partition join and will cause performance issues.
Let me give and example that can reproduce the problem:
create table src(key int, value string);
--创建分区表并且导入数据
CREATE TABLE src_partitioned1 (key int, value STRING) PARTITIONED BY (ds STRING);
insert overwrite table src_partitioned1 PARTITION (ds='1') select key, value from src;
insert overwrite table src_partitioned1 PARTITION (ds='2') select key, value from src;
CREATE TABLE src_partitioned2 (key int, value STRING) PARTITIONED BY (ds STRING);
insert overwrite table src_partitioned2 PARTITION (ds='1') select key, value from src;
insert overwrite table src_partitioned2 PARTITION (ds='2') select key, value from src;
--创建视图
create view src_view as select sum(a.key) s1, sum(b.key) s2, b.ds ds from src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds
create view src_view_1 as select sum(a.key) s1, sum(b.key) s2, b.ds my_ds from src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds
--QueryExecution
select * from dw.src_view where ds='2'
sql("select * from dw.src_view where ds='2' ").queryExecution
== Parsed Logical Plan ==
'Project [*]
'Filter ('ds = 2)
'UnresolvedRelation [dw,src_view], None
== Analyzed Logical Plan ==
Project [s1#60L,s2#61L,ds#62]
Filter (ds#62 = 2)
Subquery src_view
Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67, LongType)) AS s2#61L,ds#66 AS ds#62]
Join Inner, Some((ds#63 = ds#66))
MetastoreRelation dw, src_partitioned1, Some(a)
MetastoreRelation dw, src_partitioned2, Some(b)
== Optimized Logical Plan ==
Filter (ds#62 = 2)
Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67, LongType)) AS s2#61L,ds#66 AS ds#62]
Project [ds#66,key#64,key#67]
Join Inner, Some((ds#63 = ds#66))
Project [key#64,ds#63]
MetastoreRelation dw, s...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org