You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2018/07/22 07:27:00 UTC
[jira] [Commented] (SPARK-21274) Implement EXCEPT ALL and INTERSECT
ALL
[ https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551921#comment-16551921 ]
Takeshi Yamamuro commented on SPARK-21274:
------------------------------------------
Anybody still working on this?
> Implement EXCEPT ALL and INTERSECT ALL
> --------------------------------------
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 2.0.0, 2.1.0, 2.2.0
> Reporter: Ruslan Dautkhanov
> Priority: Major
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
> EXCEPT ALL
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROM tab1 t1
> LEFT OUTER JOIN
> tab2 t2
> ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
> )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
> INTERSECT ALL
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined above:
> {code}
> SELECT a,b,c
> FROM tab1 t1
> WHERE
> NOT EXISTS
> ( SELECT 1
> FROM t1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
> )
> {code}
> So the suggestion is just to use above query rewrites to implement both EXCEPT ALL and INTERSECT ALL sql set operations.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org