You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/10 04:53:09 UTC

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#discussion_r301885522
 
 

 ##########
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##########
 @@ -0,0 +1,164 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+    (0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+    (1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+    (1, 2), 
+    (1, 2),
+    (1, 3),
+    (2, 3),
+    (2, 2)
+    AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+    (1, 2), 
+    (2, 3),
+    (2, 2),
+    (2, 2),
+    (2, 20)
+    AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- EXCEPT ALL + INTERSECT
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+INTERSECT DISTINCT
+SELECT * FROM tab4;
+
+-- EXCEPT ALL + EXCEPT
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Mismatch on number of columns across both branches
+SELECT k FROM tab3
+EXCEPT ALL
+SELECT k, v FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Using MINUS ALL
+SELECT * FROM tab3
+MINUS ALL
+SELECT * FROM tab4
+UNION
+SELECT * FROM tab3
+MINUS DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+EXCEPT DISTINCT
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Join under except all. Should produce empty resultset since both left and right sets 
+-- are same.
+SELECT * 
+FROM   (SELECT udf(tab3.k),
+               udf(tab4.v)
+        FROM   tab3 
+               JOIN tab4 
+                 ON tab3.k = tab4.k)
 
 Review comment:
   Cool. I will test it again once your changes are merged.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org