You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stephen Link (JIRA)" <ji...@apache.org> on 2015/10/05 23:32:26 UTC

[jira] [Created] (SPARK-10933) Spark SQL Joins should have option to fail query when row multiplication is encountered

Stephen Link created SPARK-10933:
------------------------------------

             Summary: Spark SQL Joins should have option to fail query when row multiplication is encountered
                 Key: SPARK-10933
                 URL: https://issues.apache.org/jira/browse/SPARK-10933
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Stephen Link
            Priority: Minor


When constructing spark sql queries, we commonly run into scenarios where users have inadvertently caused a cartesian product/row expansion. It is sometimes possible to detect this in advance with separate queries, but it would be far more ideal if it was possible to have a setting that disallowed join keys showing up multiple times on both sides of a join operation.

This setting would belong in SQLConf. The functionality could likely be implemented by forcing a sorted shuffle, then checking for duplication on the streamed results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org