You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:13:54 UTC

[jira] [Resolved] (SPARK-21497) Pull non-deterministic joining keys from Join operator

     [ https://issues.apache.org/jira/browse/SPARK-21497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-21497.
----------------------------------
    Resolution: Incomplete

> Pull non-deterministic joining keys from Join operator
> ------------------------------------------------------
>
>                 Key: SPARK-21497
>                 URL: https://issues.apache.org/jira/browse/SPARK-21497
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Liang-Chi Hsieh
>            Priority: Major
>              Labels: bulk-closed
>
> Currently SparkSQL doesn't support non-deterministic joining conditions in Join. This kind of joining conditions can be useful in some cases, e.g., http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-Syntax-quot-case-when-quot-doesn-t-be-supported-in-JOIN-tc21953.html#a21973.
> To pull non-deterministic joining conditions from Join operator, there seems no standard behavior. Based on the discussion on https://github.com/apache/spark/pull/18652#issuecomment-316344905, https://github.com/apache/spark/pull/18652#issuecomment-316391759 and https://github.com/apache/spark/pull/18652#issuecomment-316665649, Hive doesn't have special consideration for non-deterministic join conditions and simply pushes down it or uses it as joining keys.
> In this attempt, we initially allow non-deterministic equi join keys in Join operators. Because based on SparkSQL's join implementations the equi join keys are evaluated once on joining tables, pulling equi join keys from Join operators won't change the number of calls on non-deterministic expressions. It is more safer than other kinds of joining conditions, e.g. rand(10) > a && rand(20) < b where pulling it and pushing down it will possibly change the number of calls of rand().
> We also add a SQL conf to control this new behavior. It is disabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org