You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stephen Link (JIRA)" <ji...@apache.org> on 2015/10/05 23:32:26 UTC
[jira] [Created] (SPARK-10933) Spark SQL Joins should have option
to fail query when row multiplication is encountered
Stephen Link created SPARK-10933:
------------------------------------
Summary: Spark SQL Joins should have option to fail query when row multiplication is encountered
Key: SPARK-10933
URL: https://issues.apache.org/jira/browse/SPARK-10933
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Stephen Link
Priority: Minor
When constructing spark sql queries, we commonly run into scenarios where users have inadvertently caused a cartesian product/row expansion. It is sometimes possible to detect this in advance with separate queries, but it would be far more ideal if it was possible to have a setting that disallowed join keys showing up multiple times on both sides of a join operation.
This setting would belong in SQLConf. The functionality could likely be implemented by forcing a sorted shuffle, then checking for duplication on the streamed results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org