You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/09/28 20:56:20 UTC
[jira] [Created] (KUDU-1659) Spark does not remove pushed
predicates from Spark-side query plan
Todd Lipcon created KUDU-1659:
---------------------------------
Summary: Spark does not remove pushed predicates from Spark-side query plan
Key: KUDU-1659
URL: https://issues.apache.org/jira/browse/KUDU-1659
Project: Kudu
Issue Type: Improvement
Components: perf, spark
Affects Versions: 1.0.0
Reporter: Todd Lipcon
I ran the following Spark SQL query:
{code}
select count(*) from metrics where host = "foo.com"
{code}
I verified that this resulted in the predicate being pushed to Kudu. Once the predicate is pushed, it's not necessary to evaluate again on the Spark side, and in fact Spark doesn't need to select the column at all. However, Spark appears to still be selecting the column and re-evaluating the same filter:
{code}
== Physical Plan ==
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#71L])
TungstenExchange SinglePartition
TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#74L])
Project
Filter (host#0 = foo.com)
Scan org.apache.kudu.spark.kudu.KuduRelation@1d18e5ad[host#0]
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)