You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/09/22 02:32:04 UTC

[jira] [Assigned] (SPARK-10740) handle nondeterministic expressions correctly for set operations

     [ https://issues.apache.org/jira/browse/SPARK-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-10740:
------------------------------------

    Assignee: Apache Spark

> handle nondeterministic expressions correctly for set operations
> ----------------------------------------------------------------
>
>                 Key: SPARK-10740
>                 URL: https://issues.apache.org/jira/browse/SPARK-10740
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Wenchen Fan
>            Assignee: Apache Spark
>
> We should only push down deterministic filter condition to set operator.
> For Union, let's say we do a non-deterministic filter on 1...5 union 1...5, and we may get 1,3 for the left side and 2,4 for the right side, then the result should be 1,3,2,4. If we push down this filter, we get 1,3 for both side(we create a new random object with same seed in each side) and the result would be 1,3,1,3.
> For Intersect, let's say there is a non-deterministic condition with a 0.5 possibility to accept a row and we have a row that presents in both sides of an Intersect. Once we push down this condition, the possibility to accept this row will be 0.25.
> For Except, let's say there is a row that presents in both sides of an Except. This row should not be in the final output. However, if we pushdown a non-deterministic condition, it is possible that this row is rejected from one side and then we output a row that should not be a part of the result.
>  We should only push down deterministic projection to Union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org