You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2017/07/11 20:14:00 UTC
[jira] [Issue Comment Deleted] (SPARK-21274) Implement EXCEPT ALL
and INTERSECT ALL
[ https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruslan Dautkhanov updated SPARK-21274:
--------------------------------------
Comment: was deleted
(was: [~rxin], I wish I could. We only use PySpark and SQL API to Spark. Hope this will be useful to someone versatile with Scala. Thank you for considering this.)
> Implement EXCEPT ALL and INTERSECT ALL
> --------------------------------------
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
> Issue Type: New Feature
> Components: Optimizer, SQL
> Affects Versions: 2.0.0, 2.1.0, 2.2.0
> Reporter: Ruslan Dautkhanov
> Labels: set, sql
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
> EXCEPT ALL
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROM tab1 t1
> LEFT OUTER JOIN
> tab2 t2
> ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
> )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
> INTERSECT ALL
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined above:
> {code}
> SELECT a,b,c
> FROM tab1 t1
> WHERE
> NOT EXISTS
> ( SELECT 1
> FROM t1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
> )
> {code}
> So the suggestion is just to use above query rewrites to implement both EXCEPT ALL and INTERSECT ALL sql set operations.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org