You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kevinyu98 <gi...@git.apache.org> on 2017/01/31 21:59:00 UTC
[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...
GitHub user kevinyu98 opened a pull request:
https://github.com/apache/spark/pull/16759
[SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 2nd batch
## What changes were proposed in this pull request?
This is 2nd batch of test case for IN/NOT IN subquery. In this PR, it has these test cases:
`in-limit.sql`
`in-order-by.sql`
`not-in-group-by.sql`
These are the queries and results from running on DB2.
[in-limit DB2 version](https://github.com/apache/spark/files/743267/in-limit.sql.db2.out.txt)
[in-order-by DB2 version](https://github.com/apache/spark/files/743269/in-order-by.sql.db2.txt)
[not-in-group-by DB2 version](https://github.com/apache/spark/files/743271/not-in-group-by.sql.db2.txt)
[output of in-limit.sql DB2](https://github.com/apache/spark/files/743276/in-limit.sql.db2.out.txt)
[output of in-order-by.sql DB2](https://github.com/apache/spark/files/743278/in-order-by.sql.db2.out.txt)
[output of not-in-group-by.sql DB2](https://github.com/apache/spark/files/743279/not-in-group-by.sql.db2.out.txt)
## How was this patch tested?
This pr is adding new test cases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kevinyu98/spark spark-18871-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16759.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16759
----
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/16759
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/16759#discussion_r98807968
--- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out ---
@@ -0,0 +1,147 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 8
+
+
+-- !query 0
+create temporary view t1 as select * from values
+ ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'),
+ ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'),
+ ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
+ ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'),
+ ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null),
+ ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null),
+ ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'),
+ ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'),
+ ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
+ ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'),
+ ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04')
+ as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+ ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
+ ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'),
+ ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null),
+ ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
+ ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
+ ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
+ ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'),
+ ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'),
+ ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'),
+ ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null)
+ as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+ ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'),
+ ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'),
+ ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'),
+ ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'),
+ ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'),
+ ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null),
+ ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null),
+ ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04')
+ as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3
+SELECT *
+FROM t1
+WHERE t1a IN (SELECT t2a
+ FROM t2
+ WHERE t1d = t2d)
+LIMIT 2
+-- !query 3 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 3 output
+val1b 8 16 19 17.0 25.0 2600 2014-05-04 01:01:00 2014-05-04
+val1c 8 16 19 17.0 25.0 2600 2014-05-04 01:02:00.001 2014-05-05
+
+
+-- !query 4
+SELECT *
+FROM t1
+WHERE t1c IN (SELECT t2c
+ FROM t2
+ WHERE t2b >= 8
+ LIMIT 2)
+LIMIT 4
+-- !query 4 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 4 output
+val1a 16 12 10 15.0 20.0 2000 2014-07-04 01:01:00 2014-07-04
+val1a 16 12 21 15.0 20.0 2000 2014-06-04 01:02:00.001 2014-06-04
+val1b 8 16 19 17.0 25.0 2600 2014-05-04 01:01:00 2014-05-04
+val1c 8 16 19 17.0 25.0 2600 2014-05-04 01:02:00.001 2014-05-05
+
+
+-- !query 5
+SELECT Count(DISTINCT( t1a )),
+ t1b
+FROM t1
+WHERE t1d IN (SELECT t2d
+ FROM t2
+ ORDER BY t2c
+ LIMIT 2)
+GROUP BY t1b
+ORDER BY t1b DESC NULLS FIRST
+LIMIT 1
+-- !query 5 schema
+struct<count(DISTINCT t1a):bigint,t1b:smallint>
+-- !query 5 output
+1 NULL
+
+
+-- !query 6
+SELECT *
+FROM t1
+WHERE t1b NOT IN (SELECT t2b
+ FROM t2
+ WHERE t2b > 6
+ LIMIT 2)
+-- !query 6 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 6 output
+val1a 16 12 10 15.0 20.0 2000 2014-07-04 01:01:00 2014-07-04
+val1a 16 12 21 15.0 20.0 2000 2014-06-04 01:02:00.001 2014-06-04
+val1a 6 8 10 15.0 20.0 2000 2014-04-04 01:00:00 2014-04-04
+val1a 6 8 10 15.0 20.0 2000 2014-04-04 01:02:00.001 2014-04-04
+
+
+-- !query 7
+SELECT Count(DISTINCT( t1a )),
+ t1b
+FROM t1
+WHERE t1d NOT IN (SELECT t2d
+ FROM t2
+ ORDER BY t2b DESC nulls first
+ LIMIT 1)
+GROUP BY t1b
+ORDER BY t1b NULLS last
+LIMIT 1
+-- !query 7 schema
+struct<count(DISTINCT t1a):bigint,t1b:smallint>
+-- !query 7 output
+1 6
--- End diff --
I have compared the result set matched with the result from DB2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/16759
LGTM - merging to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/16759#discussion_r98807908
--- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-group-by.sql.out ---
@@ -0,0 +1,150 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 8
+
+
+-- !query 0
+create temporary view t1 as select * from values
+ ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'),
+ ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'),
+ ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
+ ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'),
+ ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null),
+ ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null),
+ ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'),
+ ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'),
+ ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
+ ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'),
+ ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04')
+ as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+ ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
+ ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'),
+ ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null),
+ ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
+ ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
+ ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
+ ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'),
+ ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'),
+ ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'),
+ ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null)
+ as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+ ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'),
+ ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'),
+ ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'),
+ ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'),
+ ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'),
+ ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null),
+ ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null),
+ ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04')
+ as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3
+SELECT t1a,
+ Avg(t1b)
+FROM t1
+WHERE t1a NOT IN (SELECT t2a
+ FROM t2)
+GROUP BY t1a
+-- !query 3 schema
+struct<t1a:string,avg(t1b):double>
+-- !query 3 output
+val1a 11.0
+val1d 10.0
+
+
+-- !query 4
+SELECT t1a,
+ Sum(DISTINCT( t1b ))
+FROM t1
+WHERE t1d NOT IN (SELECT t2d
+ FROM t2
+ WHERE t1h < t2h)
+GROUP BY t1a
+-- !query 4 schema
+struct<t1a:string,sum(DISTINCT t1b):bigint>
+-- !query 4 output
+val1a 22
+val1d 10
+val1e 10
+
+
+-- !query 5
+SELECT Count(*)
+FROM (SELECT *
+ FROM t2
+ WHERE t2a NOT IN (SELECT t3a
+ FROM t3
+ WHERE t3h != t2h)) t2
+WHERE t2b NOT IN (SELECT Min(t2b)
+ FROM t2
+ WHERE t2b = t2b
+ GROUP BY t2c)
+-- !query 5 schema
+struct<count(1):bigint>
+-- !query 5 output
+4
+
+
+-- !query 6
+SELECT t1a,
+ max(t1b)
+FROM t1
+WHERE t1c NOT IN (SELECT Max(t2b)
+ FROM t2
+ WHERE t1a = t2a
+ GROUP BY t2a)
+GROUP BY t1a
+-- !query 6 schema
+struct<t1a:string,max(t1b):smallint>
+-- !query 6 output
+val1a 16
+val1b 8
+val1c 8
+val1d 10
+
+
+-- !query 7
+SELECT t1a,
+ t1b
+FROM t1
+WHERE t1c IN (SELECT t2b
+ FROM t2
+ WHERE t2a NOT IN (SELECT Min(t3a)
+ FROM t3
+ WHERE t3a = t2a
+ GROUP BY t3b) order by t2a)
+-- !query 7 schema
+struct<t1a:string,t1b:smallint>
+-- !query 7 output
+val1a 16
+val1a 16
+val1a 6
+val1a 6
--- End diff --
I have compared the result set matched with the result from DB2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/16759#discussion_r98807939
--- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-order-by.sql.out ---
@@ -0,0 +1,328 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 18
+
+
+-- !query 0
+create temporary view t1 as select * from values
+ ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'),
+ ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'),
+ ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
+ ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'),
+ ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null),
+ ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null),
+ ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'),
+ ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'),
+ ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
+ ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'),
+ ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04')
+ as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+ ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
+ ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'),
+ ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null),
+ ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
+ ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
+ ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
+ ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'),
+ ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'),
+ ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'),
+ ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null)
+ as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+ ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'),
+ ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'),
+ ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'),
+ ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'),
+ ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'),
+ ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null),
+ ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null),
+ ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
+ ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04')
+ as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3
+SELECT *
+FROM t1
+WHERE t1a IN (SELECT t2a
+ FROM t2)
+ORDER BY t1a
+-- !query 3 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 3 output
+val1b 8 16 19 17.0 25.0 2600 2014-05-04 01:01:00 2014-05-04
+val1c 8 16 19 17.0 25.0 2600 2014-05-04 01:02:00.001 2014-05-05
+val1e 10 NULL 25 17.0 25.0 2600 2014-08-04 01:01:00 2014-08-04
+val1e 10 NULL 19 17.0 25.0 2600 2014-09-04 01:02:00.001 2014-09-04
+val1e 10 NULL 19 17.0 25.0 2600 2014-05-04 01:01:00 2014-05-04
+
+
+-- !query 4
+SELECT t1a
+FROM t1
+WHERE t1b IN (SELECT t2b
+ FROM t2
+ WHERE t1a = t2a)
+ORDER BY t1b DESC
+-- !query 4 schema
+struct<t1a:string>
+-- !query 4 output
+val1b
+
+
+-- !query 5
+SELECT t1a,
+ t1b
+FROM t1
+WHERE t1c IN (SELECT t2c
+ FROM t2
+ WHERE t1a = t2a)
+ORDER BY 2 DESC nulls last
+-- !query 5 schema
+struct<t1a:string,t1b:smallint>
+-- !query 5 output
+val1b 8
+val1c 8
+
+
+-- !query 6
+SELECT Count(DISTINCT( t1a ))
+FROM t1
+WHERE t1b IN (SELECT t2b
+ FROM t2
+ WHERE t1a = t2a)
+ORDER BY Count(DISTINCT( t1a ))
+-- !query 6 schema
+struct<count(DISTINCT t1a):bigint>
+-- !query 6 output
+1
+
+
+-- !query 7
+SELECT *
+FROM t1
+WHERE t1b IN (SELECT t2c
+ FROM t2
+ ORDER BY t2d)
+-- !query 7 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 7 output
+val1a 16 12 10 15.0 20.0 2000 2014-07-04 01:01:00 2014-07-04
+val1a 16 12 21 15.0 20.0 2000 2014-06-04 01:02:00.001 2014-06-04
+
+
+-- !query 8
+SELECT *
+FROM t1
+WHERE t1b IN (SELECT Min(t2b)
+ FROM t2
+ WHERE t1b = t2b
+ ORDER BY Min(t2b))
+ORDER BY t1c DESC nulls first
+-- !query 8 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 8 output
+val1e 10 NULL 25 17.0 25.0 2600 2014-08-04 01:01:00 2014-08-04
+val1e 10 NULL 19 17.0 25.0 2600 2014-09-04 01:02:00.001 2014-09-04
+val1d 10 NULL 12 17.0 25.0 2600 2015-05-04 01:01:00 2015-05-04
+val1e 10 NULL 19 17.0 25.0 2600 2014-05-04 01:01:00 2014-05-04
+val1b 8 16 19 17.0 25.0 2600 2014-05-04 01:01:00 2014-05-04
+val1c 8 16 19 17.0 25.0 2600 2014-05-04 01:02:00.001 2014-05-05
+val1a 6 8 10 15.0 20.0 2000 2014-04-04 01:00:00 2014-04-04
+val1a 6 8 10 15.0 20.0 2000 2014-04-04 01:02:00.001 2014-04-04
+
+
+-- !query 9
+SELECT t1a,
+ t1b,
+ t1h
+FROM t1
+WHERE t1c IN (SELECT t2c
+ FROM t2
+ WHERE t1a = t2a
+ ORDER BY t2b DESC nulls first)
+ OR t1h IN (SELECT t2h
+ FROM t2
+ WHERE t1h > t2h)
+ORDER BY t1h DESC nulls last
+-- !query 9 schema
+struct<t1a:string,t1b:smallint,t1h:timestamp>
+-- !query 9 output
+val1c 8 2014-05-04 01:02:00.001
+val1b 8 2014-05-04 01:01:00
+
+
+-- !query 10
+SELECT *
+FROM t1
+WHERE t1a NOT IN (SELECT t2a
+ FROM t2)
+ORDER BY t1a
+-- !query 10 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 10 output
+val1a 6 8 10 15.0 20.0 2000 2014-04-04 01:00:00 2014-04-04
+val1a 16 12 21 15.0 20.0 2000 2014-06-04 01:02:00.001 2014-06-04
+val1a 16 12 10 15.0 20.0 2000 2014-07-04 01:01:00 2014-07-04
+val1a 6 8 10 15.0 20.0 2000 2014-04-04 01:02:00.001 2014-04-04
+val1d NULL 16 22 17.0 25.0 2600 2014-06-04 01:01:00 NULL
+val1d NULL 16 19 17.0 25.0 2600 2014-07-04 01:02:00.001 NULL
+val1d 10 NULL 12 17.0 25.0 2600 2015-05-04 01:01:00 2015-05-04
+
+
+-- !query 11
+SELECT t1a,
+ t1b
+FROM t1
+WHERE t1a NOT IN (SELECT t2a
+ FROM t2
+ WHERE t1a = t2a)
+ORDER BY t1b DESC nulls last
+-- !query 11 schema
+struct<t1a:string,t1b:smallint>
+-- !query 11 output
+val1a 16
+val1a 16
+val1d 10
+val1a 6
+val1a 6
+val1d NULL
+val1d NULL
+
+
+-- !query 12
+SELECT *
+FROM t1
+WHERE t1a NOT IN (SELECT t2a
+ FROM t2
+ ORDER BY t2a DESC nulls first)
+ and t1c IN (SELECT t2c
+ FROM t2
+ ORDER BY t2b DESC nulls last)
+ORDER BY t1c DESC nulls last
+-- !query 12 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 12 output
+val1d NULL 16 22 17.0 25.0 2600 2014-06-04 01:01:00 NULL
+val1d NULL 16 19 17.0 25.0 2600 2014-07-04 01:02:00.001 NULL
+val1a 16 12 21 15.0 20.0 2000 2014-06-04 01:02:00.001 2014-06-04
+val1a 16 12 10 15.0 20.0 2000 2014-07-04 01:01:00 2014-07-04
+
+
+-- !query 13
+SELECT *
+FROM t1
+WHERE t1b IN (SELECT Min(t2b)
+ FROM t2
+ GROUP BY t2a
+ ORDER BY t2a DESC)
+-- !query 13 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 13 output
+val1a 6 8 10 15.0 20.0 2000 2014-04-04 01:00:00 2014-04-04
+val1a 6 8 10 15.0 20.0 2000 2014-04-04 01:02:00.001 2014-04-04
+val1b 8 16 19 17.0 25.0 2600 2014-05-04 01:01:00 2014-05-04
+val1c 8 16 19 17.0 25.0 2600 2014-05-04 01:02:00.001 2014-05-05
+
+
+-- !query 14
+SELECT t1a,
+ Count(DISTINCT( t1b ))
+FROM t1
+WHERE t1b IN (SELECT Min(t2b)
+ FROM t2
+ WHERE t1a = t2a
+ GROUP BY t2a
+ ORDER BY t2a)
+GROUP BY t1a,
+ t1h
+ORDER BY t1a
+-- !query 14 schema
+struct<t1a:string,count(DISTINCT t1b):bigint>
+-- !query 14 output
+val1b 1
+
+
+-- !query 15
+SELECT *
+FROM t1
+WHERE t1b NOT IN (SELECT Min(t2b)
+ FROM t2
+ GROUP BY t2a
+ ORDER BY t2a)
+-- !query 15 schema
+struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
+-- !query 15 output
+val1a 16 12 10 15.0 20.0 2000 2014-07-04 01:01:00 2014-07-04
+val1a 16 12 21 15.0 20.0 2000 2014-06-04 01:02:00.001 2014-06-04
+val1d 10 NULL 12 17.0 25.0 2600 2015-05-04 01:01:00 2015-05-04
+val1e 10 NULL 19 17.0 25.0 2600 2014-05-04 01:01:00 2014-05-04
+val1e 10 NULL 19 17.0 25.0 2600 2014-09-04 01:02:00.001 2014-09-04
+val1e 10 NULL 25 17.0 25.0 2600 2014-08-04 01:01:00 2014-08-04
+
+
+-- !query 16
+SELECT t1a,
+ Sum(DISTINCT( t1b ))
+FROM t1
+WHERE t1b NOT IN (SELECT Min(t2b)
+ FROM t2
+ WHERE t1a = t2a
+ GROUP BY t2c
+ ORDER BY t2c DESC nulls last)
+GROUP BY t1a
+-- !query 16 schema
+struct<t1a:string,sum(DISTINCT t1b):bigint>
+-- !query 16 output
+val1a 22
+val1c 8
+val1d 10
+val1e 10
+
+
+-- !query 17
+SELECT Count(DISTINCT( t1a )),
+ t1b
+FROM t1
+WHERE t1h NOT IN (SELECT t2h
+ FROM t2
+ where t1a = t2a
+ order by t2d DESC nulls first
+ )
+GROUP BY t1a,
+ t1b
+ORDER BY t1b DESC nulls last
+-- !query 17 schema
+struct<count(DISTINCT t1a):bigint,t1b:smallint>
+-- !query 17 output
+1 16
+1 10
+1 10
+1 8
+1 6
+1 NULL
--- End diff --
I have compared the result set matched with the result from DB2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16759
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org