You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/09/28 20:56:20 UTC

[jira] [Created] (KUDU-1659) Spark does not remove pushed predicates from Spark-side query plan

Todd Lipcon created KUDU-1659:
---------------------------------

             Summary: Spark does not remove pushed predicates from Spark-side query plan
                 Key: KUDU-1659
                 URL: https://issues.apache.org/jira/browse/KUDU-1659
             Project: Kudu
          Issue Type: Improvement
          Components: perf, spark
    Affects Versions: 1.0.0
            Reporter: Todd Lipcon


I ran the following Spark SQL query:
{code}
select count(*) from metrics where host = "foo.com"
{code}

I verified that this resulted in the predicate being pushed to Kudu. Once the predicate is pushed, it's not necessary to evaluate again on the Spark side, and in fact Spark doesn't need to select the column at all. However, Spark appears to still be selecting the column and re-evaluating the same filter:

{code}

== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#71L])
 TungstenExchange SinglePartition
  TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#74L])
   Project
    Filter (host#0 = foo.com)
     Scan org.apache.kudu.spark.kudu.KuduRelation@1d18e5ad[host#0]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)