You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2015/07/16 01:08:04 UTC
[jira] [Comment Edited] (SPARK-9082) Filter using non-deterministic
expressions should not be pushed down
[ https://issues.apache.org/jira/browse/SPARK-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628912#comment-14628912 ]
Yin Huai edited comment on SPARK-9082 at 7/15/15 11:07 PM:
-----------------------------------------------------------
cc [~marmbrus]
Let me know if I miss any thing.
was (Author: yhuai):
cc [~marmbrus]
> Filter using non-deterministic expressions should not be pushed down
> --------------------------------------------------------------------
>
> Key: SPARK-9082
> URL: https://issues.apache.org/jira/browse/SPARK-9082
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Yin Huai
> Assignee: Wenchen Fan
>
> For example,
> {code}
> val df = sqlContext.range(1, 10).select($"id", rand(0).as('r))
> df.as("a").join(df.filter($"r" < 0.5).as("b"), $"a.id" === $"b.id").explain(true)
> {code}
> The plan is
> {code}
> == Physical Plan ==
> ShuffledHashJoin [id#55323L], [id#55327L], BuildRight
> Exchange (HashPartitioning 200)
> Project [id#55323L,Rand 0 AS r#55324]
> PhysicalRDD [id#55323L], MapPartitionsRDD[42268] at range at <console>:37
> Exchange (HashPartitioning 200)
> Project [id#55327L,Rand 0 AS r#55325]
> Filter (LessThan)
> PhysicalRDD [id#55327L], MapPartitionsRDD[42268] at range at <console>:37
> {code}
> The rand get evaluated twice instead of once.
> This is caused by when we push down predicates we replace the attribute reference in the predicate with the actual expression.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org