You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by victor sheng <vi...@gmail.com> on 2014/07/14 12:42:59 UTC
spark1.0.1 catalyst transform filter not push down
Hi, I encountered a weird problem in spark sql.
I use sbt/sbt hive/console to go into the shell.
I test the filter push down by using catalyst.
scala> val queryPlan = sql("select value from (select key,value from src)a
where a.key=86 ")
scala> queryPlan.baseLogicalPlan
res0: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Project ['value]
Filter ('a.key = 86)
Subquery a
Project ['key,'value]
UnresolvedRelation None, src, None
I want to achieve the "Filter Push Down".
So I run :
scala> var newQuery = queryPlan.baseLogicalPlan transform {
| case f @ Filter(_, p @ Project(_,grandChild))
| if (f.references subsetOf grandChild.output) =>
| p.copy(child = f.copy(child = grandChild))
| }
<console>:42: error: type mismatch;
found : Seq[org.apache.spark.sql.catalyst.expressions.Attribute]
required:
scala.collection.GenSet[org.apache.spark.sql.catalyst.expressions.Attribute]
if (f.references subsetOf grandChild.output) =>
^
It throws exception above. I don't know what's wrong.
If I run :
var newQuery = queryPlan.baseLogicalPlan transform {
case f @ Filter(_, p @ Project(_,grandChild))
if true =>
p.copy(child = f.copy(child = grandChild))
}
scala> var newQuery = queryPlan.baseLogicalPlan transform {
| case f @ Filter(_, p @ Project(_,grandChild))
| if true =>
| p.copy(child = f.copy(child = grandChild))
| }
newQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Project ['value]
Filter ('a.key = 86)
Subquery a
Project ['key,'value]
UnresolvedRelation None, src, None
It seems the Filter also in the same position, not switch the order.
Can anyone guide me about it?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark1.0.1 catalyst transform filter not push down
Posted by Victor Sheng <vi...@gmail.com>.
I use queryPlan.queryExecution.analyzed to get the logical plan.
it works.
And What you explained to me is very useful.
Thank you very much.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599p9689.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark1.0.1 catalyst transform filter not push down
Posted by Yin Huai <hu...@gmail.com>.
Hi,
queryPlan.baseLogicalPlan is not the plan used to execution. Actually,
the baseLogicalPlan
of a SchemaRDD (queryPlan in your case) is just the parsed plan (the parsed
plan will be analyzed, and then optimized. Finally, a physical plan will be
created). The plan shows up after you execute "val queryPlan = sql("select
value from (select key,value from src)a where a.key=86 ")" is the physical
plan. Or, you can use queryPlan.queryExecution to see the Logical Plan,
Optimized Logical Plan, and Physical Plan. You can find the physical plan
is
== Physical Plan ==
Project [value#3:0]
Filter (key#2:1 = 86)
HiveTableScan [value#3,key#2], (MetastoreRelation default, src, None),
None
Thanks,
Yin
On Mon, Jul 14, 2014 at 3:42 AM, victor sheng <vi...@gmail.com>
wrote:
> Hi, I encountered a weird problem in spark sql.
> I use sbt/sbt hive/console to go into the shell.
>
> I test the filter push down by using catalyst.
>
> scala> val queryPlan = sql("select value from (select key,value from src)a
> where a.key=86 ")
> scala> queryPlan.baseLogicalPlan
> res0: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
> Project ['value]
> Filter ('a.key = 86)
> Subquery a
> Project ['key,'value]
> UnresolvedRelation None, src, None
>
> I want to achieve the "Filter Push Down".
>
> So I run :
> scala> var newQuery = queryPlan.baseLogicalPlan transform {
> | case f @ Filter(_, p @ Project(_,grandChild))
> | if (f.references subsetOf grandChild.output) =>
> | p.copy(child = f.copy(child = grandChild))
> | }
> <console>:42: error: type mismatch;
> found : Seq[org.apache.spark.sql.catalyst.expressions.Attribute]
> required:
>
> scala.collection.GenSet[org.apache.spark.sql.catalyst.expressions.Attribute]
> if (f.references subsetOf grandChild.output) =>
> ^
> It throws exception above. I don't know what's wrong.
>
> If I run :
> var newQuery = queryPlan.baseLogicalPlan transform {
> case f @ Filter(_, p @ Project(_,grandChild))
> if true =>
> p.copy(child = f.copy(child = grandChild))
> }
> scala> var newQuery = queryPlan.baseLogicalPlan transform {
> | case f @ Filter(_, p @ Project(_,grandChild))
> | if true =>
> | p.copy(child = f.copy(child = grandChild))
> | }
> newQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
> Project ['value]
> Filter ('a.key = 86)
> Subquery a
> Project ['key,'value]
> UnresolvedRelation None, src, None
>
> It seems the Filter also in the same position, not switch the order.
> Can anyone guide me about it?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>