You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by victor sheng <vi...@gmail.com> on 2014/07/14 12:42:59 UTC

spark1.0.1 catalyst transform filter not push down

Hi, I encountered a weird problem in spark sql.
I use sbt/sbt hive/console  to go into the shell.

I test the filter push down by using catalyst.

scala>  val queryPlan = sql("select value from (select key,value from src)a
where a.key=86 ")
scala> queryPlan.baseLogicalPlan
res0: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = 
Project ['value]
 Filter ('a.key = 86)
  Subquery a
   Project ['key,'value]
    UnresolvedRelation None, src, None

I want to achieve the "Filter Push Down".

So I run :
scala> var newQuery = queryPlan.baseLogicalPlan transform {
     |     case f @ Filter(_, p @ Project(_,grandChild)) 
     |     if (f.references subsetOf grandChild.output) => 
     |     p.copy(child = f.copy(child = grandChild))
     | }
<console>:42: error: type mismatch;
 found   : Seq[org.apache.spark.sql.catalyst.expressions.Attribute]
 required:
scala.collection.GenSet[org.apache.spark.sql.catalyst.expressions.Attribute]
           if (f.references subsetOf grandChild.output) => 
                                                ^
It throws exception above. I don't know what's wrong.

If I run :
var newQuery = queryPlan.baseLogicalPlan transform {
    case f @ Filter(_, p @ Project(_,grandChild)) 
    if true => 
    p.copy(child = f.copy(child = grandChild))
}
scala> var newQuery = queryPlan.baseLogicalPlan transform {
     |     case f @ Filter(_, p @ Project(_,grandChild)) 
     |     if true => 
     |     p.copy(child = f.copy(child = grandChild))
     | }
newQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = 
Project ['value]
 Filter ('a.key = 86)
  Subquery a
   Project ['key,'value]
    UnresolvedRelation None, src, None

It seems the Filter also in the same position, not switch the order.
Can anyone guide me about it?




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: spark1.0.1 catalyst transform filter not push down

Posted by Victor Sheng <vi...@gmail.com>.

I use queryPlan.queryExecution.analyzed to get the logical plan.

it works.

And What you explained to me is very useful. 

Thank you very much.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599p9689.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: spark1.0.1 catalyst transform filter not push down

Posted by Yin Huai <hu...@gmail.com>.

Hi,

queryPlan.baseLogicalPlan is not the plan used to execution. Actually,
the baseLogicalPlan
of a SchemaRDD (queryPlan in your case) is just the parsed plan (the parsed
plan will be analyzed, and then optimized. Finally, a physical plan will be
created). The plan shows up after you execute "val queryPlan = sql("select
value from (select key,value from src)a where a.key=86 ")" is the physical
plan. Or, you can use queryPlan.queryExecution to see the Logical Plan,
Optimized Logical Plan, and Physical Plan. You can find the physical plan
is

== Physical Plan ==
Project [value#3:0]
 Filter (key#2:1 = 86)
  HiveTableScan [value#3,key#2], (MetastoreRelation default, src, None),
None

Thanks,

Yin



On Mon, Jul 14, 2014 at 3:42 AM, victor sheng <vi...@gmail.com>
wrote:

> Hi, I encountered a weird problem in spark sql.
> I use sbt/sbt hive/console  to go into the shell.
>
> I test the filter push down by using catalyst.
>
> scala>  val queryPlan = sql("select value from (select key,value from src)a
> where a.key=86 ")
> scala> queryPlan.baseLogicalPlan
> res0: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
> Project ['value]
>  Filter ('a.key = 86)
>   Subquery a
>    Project ['key,'value]
>     UnresolvedRelation None, src, None
>
> I want to achieve the "Filter Push Down".
>
> So I run :
> scala> var newQuery = queryPlan.baseLogicalPlan transform {
>      |     case f @ Filter(_, p @ Project(_,grandChild))
>      |     if (f.references subsetOf grandChild.output) =>
>      |     p.copy(child = f.copy(child = grandChild))
>      | }
> <console>:42: error: type mismatch;
>  found   : Seq[org.apache.spark.sql.catalyst.expressions.Attribute]
>  required:
>
> scala.collection.GenSet[org.apache.spark.sql.catalyst.expressions.Attribute]
>            if (f.references subsetOf grandChild.output) =>
>                                                 ^
> It throws exception above. I don't know what's wrong.
>
> If I run :
> var newQuery = queryPlan.baseLogicalPlan transform {
>     case f @ Filter(_, p @ Project(_,grandChild))
>     if true =>
>     p.copy(child = f.copy(child = grandChild))
> }
> scala> var newQuery = queryPlan.baseLogicalPlan transform {
>      |     case f @ Filter(_, p @ Project(_,grandChild))
>      |     if true =>
>      |     p.copy(child = f.copy(child = grandChild))
>      | }
> newQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
> Project ['value]
>  Filter ('a.key = 86)
>   Subquery a
>    Project ['key,'value]
>     UnresolvedRelation None, src, None
>
> It seems the Filter also in the same position, not switch the order.
> Can anyone guide me about it?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>