You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2015/07/16 01:01:04 UTC

[jira] [Created] (SPARK-9082) Filter using non-deterministic expressions should not be pushed down

Yin Huai created SPARK-9082:
-------------------------------

             Summary: Filter using non-deterministic expressions should not be pushed down
                 Key: SPARK-9082
                 URL: https://issues.apache.org/jira/browse/SPARK-9082
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
            Reporter: Yin Huai


For example,
{code}
val df = sqlContext.range(1, 10).select($"id", rand(0).as('r))
df.as("a").join(df.filter($"r" < 0.5).as("b"), $"a.id" === $"b.id").explain(true)
{code}
The plan is 
{code}
== Physical Plan ==
ShuffledHashJoin [id#55323L], [id#55327L], BuildRight
 Exchange (HashPartitioning 200)
  Project [id#55323L,Rand 0 AS r#55324]
   PhysicalRDD [id#55323L], MapPartitionsRDD[42268] at range at <console>:37
 Exchange (HashPartitioning 200)
  Project [id#55327L,Rand 0 AS r#55325]
   Filter (LessThan)
    PhysicalRDD [id#55327L], MapPartitionsRDD[42268] at range at <console>:37
{code}
The rand get evaluated twice instead of once. 

This is caused by when we push down predicates we replace the attribute reference in the predicate with the actual expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org