You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/09/29 05:06:20 UTC

[jira] [Resolved] (KUDU-1659) Spark does not remove pushed predicates from Spark-side query plan

     [ https://issues.apache.org/jira/browse/KUDU-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved KUDU-1659.
-------------------------------
       Resolution: Invalid
    Fix Version/s: n/a

Turns out I was using Spark 1.5 which didn't support the necessary APIs. When I upgraded to Spark 1.6 the plans look right.

> Spark does not remove pushed predicates from Spark-side query plan
> ------------------------------------------------------------------
>
>                 Key: KUDU-1659
>                 URL: https://issues.apache.org/jira/browse/KUDU-1659
>             Project: Kudu
>          Issue Type: Improvement
>          Components: perf, spark
>    Affects Versions: 1.0.0
>            Reporter: Todd Lipcon
>             Fix For: n/a
>
>
> I ran the following Spark SQL query:
> {code}
> select count(*) from metrics where host = "foo.com"
> {code}
> I verified that this resulted in the predicate being pushed to Kudu. Once the predicate is pushed, it's not necessary to evaluate again on the Spark side, and in fact Spark doesn't need to select the column at all. However, Spark appears to still be selecting the column and re-evaluating the same filter:
> {code}
> == Physical Plan ==
> TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#71L])
>  TungstenExchange SinglePartition
>   TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#74L])
>    Project
>     Filter (host#0 = foo.com)
>      Scan org.apache.kudu.spark.kudu.KuduRelation@1d18e5ad[host#0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)