You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kevinyu98 <gi...@git.apache.org> on 2017/01/31 21:59:00 UTC

[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

GitHub user kevinyu98 opened a pull request:

    https://github.com/apache/spark/pull/16759

    [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 2nd batch

    ## What changes were proposed in this pull request?
    
    This is 2nd batch of test case for IN/NOT IN subquery.  In this PR, it has these test cases:
    `in-limit.sql`
    `in-order-by.sql`
    `not-in-group-by.sql`
    
    These are the queries and results from running on DB2.
    [in-limit DB2 version](https://github.com/apache/spark/files/743267/in-limit.sql.db2.out.txt)
    [in-order-by DB2 version](https://github.com/apache/spark/files/743269/in-order-by.sql.db2.txt)
    [not-in-group-by DB2 version](https://github.com/apache/spark/files/743271/not-in-group-by.sql.db2.txt)
    [output of in-limit.sql DB2](https://github.com/apache/spark/files/743276/in-limit.sql.db2.out.txt)
    [output of in-order-by.sql DB2](https://github.com/apache/spark/files/743278/in-order-by.sql.db2.out.txt)
    [output of not-in-group-by.sql DB2](https://github.com/apache/spark/files/743279/not-in-group-by.sql.db2.out.txt)
    
    ## How was this patch tested?
    
    This pr is adding new test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevinyu98/spark spark-18871-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16759
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16759


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16759#discussion_r98807968
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-limit.sql.out ---
    @@ -0,0 +1,147 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 8
    +
    +
    +-- !query 0
    +create temporary view t1 as select * from values
    +  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'),
    +  ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'),
    +  ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
    +  ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'),
    +  ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null),
    +  ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null),
    +  ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'),
    +  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'),
    +  ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
    +  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'),
    +  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04')
    +  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +create temporary view t2 as select * from values
    +  ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
    +  ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'),
    +  ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null),
    +  ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
    +  ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
    +  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
    +  ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'),
    +  ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'),
    +  ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'),
    +  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null)
    +  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +create temporary view t3 as select * from values
    +  ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'),
    +  ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'),
    +  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'),
    +  ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'),
    +  ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'),
    +  ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null),
    +  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null),
    +  ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04')
    +  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +SELECT *
    +FROM   t1
    +WHERE  t1a IN (SELECT t2a
    +               FROM   t2
    +               WHERE  t1d = t2d)
    +LIMIT  2
    +-- !query 3 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 3 output
    +val1b	8	16	19	17.0	25.0	2600	2014-05-04 01:01:00	2014-05-04
    +val1c	8	16	19	17.0	25.0	2600	2014-05-04 01:02:00.001	2014-05-05
    +
    +
    +-- !query 4
    +SELECT *
    +FROM   t1
    +WHERE  t1c IN (SELECT t2c
    +               FROM   t2
    +               WHERE  t2b >= 8
    +               LIMIT  2)
    +LIMIT 4
    +-- !query 4 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 4 output
    +val1a	16	12	10	15.0	20.0	2000	2014-07-04 01:01:00	2014-07-04
    +val1a	16	12	21	15.0	20.0	2000	2014-06-04 01:02:00.001	2014-06-04
    +val1b	8	16	19	17.0	25.0	2600	2014-05-04 01:01:00	2014-05-04
    +val1c	8	16	19	17.0	25.0	2600	2014-05-04 01:02:00.001	2014-05-05
    +
    +
    +-- !query 5
    +SELECT Count(DISTINCT( t1a )),
    +       t1b
    +FROM   t1
    +WHERE  t1d IN (SELECT t2d
    +               FROM   t2
    +               ORDER  BY t2c
    +               LIMIT 2)
    +GROUP  BY t1b
    +ORDER  BY t1b DESC NULLS FIRST
    +LIMIT  1
    +-- !query 5 schema
    +struct<count(DISTINCT t1a):bigint,t1b:smallint>
    +-- !query 5 output
    +1	NULL
    +
    +
    +-- !query 6
    +SELECT *
    +FROM   t1
    +WHERE  t1b NOT IN (SELECT t2b
    +                   FROM   t2
    +                   WHERE  t2b > 6
    +                   LIMIT  2)
    +-- !query 6 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 6 output
    +val1a	16	12	10	15.0	20.0	2000	2014-07-04 01:01:00	2014-07-04
    +val1a	16	12	21	15.0	20.0	2000	2014-06-04 01:02:00.001	2014-06-04
    +val1a	6	8	10	15.0	20.0	2000	2014-04-04 01:00:00	2014-04-04
    +val1a	6	8	10	15.0	20.0	2000	2014-04-04 01:02:00.001	2014-04-04
    +
    +
    +-- !query 7
    +SELECT Count(DISTINCT( t1a )),
    +       t1b
    +FROM   t1
    +WHERE  t1d NOT IN (SELECT t2d
    +                   FROM   t2
    +                   ORDER  BY t2b DESC nulls first
    +                   LIMIT 1)
    +GROUP  BY t1b
    +ORDER BY t1b NULLS last
    +LIMIT  1
    +-- !query 7 schema
    +struct<count(DISTINCT t1a):bigint,t1b:smallint>
    +-- !query 7 output
    +1	6
    --- End diff --
    
    I have compared the result set matched with the result from DB2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/16759
  
    LGTM - merging to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16759#discussion_r98807908
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-group-by.sql.out ---
    @@ -0,0 +1,150 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 8
    +
    +
    +-- !query 0
    +create temporary view t1 as select * from values
    +  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'),
    +  ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'),
    +  ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
    +  ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'),
    +  ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null),
    +  ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null),
    +  ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'),
    +  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'),
    +  ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
    +  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'),
    +  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04')
    +  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +create temporary view t2 as select * from values
    +  ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
    +  ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'),
    +  ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null),
    +  ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
    +  ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
    +  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
    +  ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'),
    +  ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'),
    +  ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'),
    +  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null)
    +  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +create temporary view t3 as select * from values
    +  ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'),
    +  ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'),
    +  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'),
    +  ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'),
    +  ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'),
    +  ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null),
    +  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null),
    +  ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04')
    +  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +SELECT t1a,
    +       Avg(t1b)
    +FROM   t1
    +WHERE  t1a NOT IN (SELECT t2a
    +                   FROM   t2)
    +GROUP  BY t1a
    +-- !query 3 schema
    +struct<t1a:string,avg(t1b):double>
    +-- !query 3 output
    +val1a	11.0
    +val1d	10.0
    +
    +
    +-- !query 4
    +SELECT t1a,
    +       Sum(DISTINCT( t1b ))
    +FROM   t1
    +WHERE  t1d NOT IN (SELECT t2d
    +                   FROM   t2
    +                   WHERE  t1h < t2h)
    +GROUP  BY t1a
    +-- !query 4 schema
    +struct<t1a:string,sum(DISTINCT t1b):bigint>
    +-- !query 4 output
    +val1a	22
    +val1d	10
    +val1e	10
    +
    +
    +-- !query 5
    +SELECT Count(*)
    +FROM   (SELECT *
    +        FROM   t2
    +        WHERE  t2a NOT IN (SELECT t3a
    +                           FROM   t3
    +                           WHERE  t3h != t2h)) t2
    +WHERE  t2b NOT IN (SELECT Min(t2b)
    +                   FROM   t2
    +                   WHERE  t2b = t2b
    +                   GROUP  BY t2c)
    +-- !query 5 schema
    +struct<count(1):bigint>
    +-- !query 5 output
    +4
    +
    +
    +-- !query 6
    +SELECT t1a,
    +       max(t1b)
    +FROM   t1
    +WHERE  t1c NOT IN (SELECT Max(t2b)
    +                   FROM   t2
    +                   WHERE  t1a = t2a
    +                   GROUP  BY t2a)
    +GROUP BY t1a
    +-- !query 6 schema
    +struct<t1a:string,max(t1b):smallint>
    +-- !query 6 output
    +val1a	16
    +val1b	8
    +val1c	8
    +val1d	10
    +
    +
    +-- !query 7
    +SELECT t1a,
    +       t1b
    +FROM   t1
    +WHERE  t1c IN (SELECT t2b
    +               FROM   t2
    +               WHERE  t2a NOT IN (SELECT Min(t3a)
    +                                  FROM   t3
    +                                  WHERE  t3a = t2a
    +                                  GROUP  BY t3b) order by t2a)
    +-- !query 7 schema
    +struct<t1a:string,t1b:smallint>
    +-- !query 7 output
    +val1a	16
    +val1a	16
    +val1a	6
    +val1a	6
    --- End diff --
    
    I have compared the result set matched with the result from DB2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16759#discussion_r98807939
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-order-by.sql.out ---
    @@ -0,0 +1,328 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 18
    +
    +
    +-- !query 0
    +create temporary view t1 as select * from values
    +  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'),
    +  ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'),
    +  ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
    +  ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'),
    +  ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null),
    +  ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null),
    +  ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'),
    +  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'),
    +  ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
    +  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'),
    +  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04')
    +  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +create temporary view t2 as select * from values
    +  ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'),
    +  ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'),
    +  ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null),
    +  ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
    +  ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'),
    +  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'),
    +  ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'),
    +  ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'),
    +  ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'),
    +  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null)
    +  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +create temporary view t3 as select * from values
    +  ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'),
    +  ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'),
    +  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'),
    +  ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'),
    +  ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'),
    +  ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null),
    +  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null),
    +  ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
    +  ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04')
    +  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +SELECT *
    +FROM   t1
    +WHERE  t1a IN (SELECT t2a
    +               FROM   t2)
    +ORDER  BY t1a
    +-- !query 3 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 3 output
    +val1b	8	16	19	17.0	25.0	2600	2014-05-04 01:01:00	2014-05-04
    +val1c	8	16	19	17.0	25.0	2600	2014-05-04 01:02:00.001	2014-05-05
    +val1e	10	NULL	25	17.0	25.0	2600	2014-08-04 01:01:00	2014-08-04
    +val1e	10	NULL	19	17.0	25.0	2600	2014-09-04 01:02:00.001	2014-09-04
    +val1e	10	NULL	19	17.0	25.0	2600	2014-05-04 01:01:00	2014-05-04
    +
    +
    +-- !query 4
    +SELECT t1a
    +FROM   t1
    +WHERE  t1b IN (SELECT t2b
    +               FROM   t2
    +               WHERE  t1a = t2a)
    +ORDER  BY t1b DESC
    +-- !query 4 schema
    +struct<t1a:string>
    +-- !query 4 output
    +val1b
    +
    +
    +-- !query 5
    +SELECT t1a,
    +       t1b
    +FROM   t1
    +WHERE  t1c IN (SELECT t2c
    +               FROM   t2
    +               WHERE  t1a = t2a)
    +ORDER  BY 2 DESC nulls last
    +-- !query 5 schema
    +struct<t1a:string,t1b:smallint>
    +-- !query 5 output
    +val1b	8
    +val1c	8
    +
    +
    +-- !query 6
    +SELECT Count(DISTINCT( t1a ))
    +FROM   t1
    +WHERE  t1b IN (SELECT t2b
    +               FROM   t2
    +               WHERE  t1a = t2a)
    +ORDER  BY Count(DISTINCT( t1a ))
    +-- !query 6 schema
    +struct<count(DISTINCT t1a):bigint>
    +-- !query 6 output
    +1
    +
    +
    +-- !query 7
    +SELECT *
    +FROM   t1
    +WHERE  t1b IN (SELECT t2c
    +               FROM   t2
    +               ORDER  BY t2d)
    +-- !query 7 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 7 output
    +val1a	16	12	10	15.0	20.0	2000	2014-07-04 01:01:00	2014-07-04
    +val1a	16	12	21	15.0	20.0	2000	2014-06-04 01:02:00.001	2014-06-04
    +
    +
    +-- !query 8
    +SELECT *
    +FROM   t1
    +WHERE  t1b IN (SELECT Min(t2b)
    +               FROM   t2
    +               WHERE  t1b = t2b
    +               ORDER  BY Min(t2b))
    +ORDER BY t1c DESC nulls first
    +-- !query 8 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 8 output
    +val1e	10	NULL	25	17.0	25.0	2600	2014-08-04 01:01:00	2014-08-04
    +val1e	10	NULL	19	17.0	25.0	2600	2014-09-04 01:02:00.001	2014-09-04
    +val1d	10	NULL	12	17.0	25.0	2600	2015-05-04 01:01:00	2015-05-04
    +val1e	10	NULL	19	17.0	25.0	2600	2014-05-04 01:01:00	2014-05-04
    +val1b	8	16	19	17.0	25.0	2600	2014-05-04 01:01:00	2014-05-04
    +val1c	8	16	19	17.0	25.0	2600	2014-05-04 01:02:00.001	2014-05-05
    +val1a	6	8	10	15.0	20.0	2000	2014-04-04 01:00:00	2014-04-04
    +val1a	6	8	10	15.0	20.0	2000	2014-04-04 01:02:00.001	2014-04-04
    +
    +
    +-- !query 9
    +SELECT t1a,
    +       t1b,
    +       t1h
    +FROM   t1
    +WHERE  t1c IN (SELECT t2c
    +               FROM   t2
    +               WHERE  t1a = t2a
    +               ORDER  BY t2b DESC nulls first)
    +        OR t1h IN (SELECT t2h
    +                   FROM   t2
    +                   WHERE  t1h > t2h)
    +ORDER  BY t1h DESC nulls last
    +-- !query 9 schema
    +struct<t1a:string,t1b:smallint,t1h:timestamp>
    +-- !query 9 output
    +val1c	8	2014-05-04 01:02:00.001
    +val1b	8	2014-05-04 01:01:00
    +
    +
    +-- !query 10
    +SELECT *
    +FROM   t1
    +WHERE  t1a NOT IN (SELECT t2a
    +                   FROM   t2)
    +ORDER  BY t1a
    +-- !query 10 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 10 output
    +val1a	6	8	10	15.0	20.0	2000	2014-04-04 01:00:00	2014-04-04
    +val1a	16	12	21	15.0	20.0	2000	2014-06-04 01:02:00.001	2014-06-04
    +val1a	16	12	10	15.0	20.0	2000	2014-07-04 01:01:00	2014-07-04
    +val1a	6	8	10	15.0	20.0	2000	2014-04-04 01:02:00.001	2014-04-04
    +val1d	NULL	16	22	17.0	25.0	2600	2014-06-04 01:01:00	NULL
    +val1d	NULL	16	19	17.0	25.0	2600	2014-07-04 01:02:00.001	NULL
    +val1d	10	NULL	12	17.0	25.0	2600	2015-05-04 01:01:00	2015-05-04
    +
    +
    +-- !query 11
    +SELECT t1a,
    +       t1b
    +FROM   t1
    +WHERE  t1a NOT IN (SELECT t2a
    +                   FROM   t2
    +                   WHERE  t1a = t2a)
    +ORDER  BY t1b DESC nulls last
    +-- !query 11 schema
    +struct<t1a:string,t1b:smallint>
    +-- !query 11 output
    +val1a	16
    +val1a	16
    +val1d	10
    +val1a	6
    +val1a	6
    +val1d	NULL
    +val1d	NULL
    +
    +
    +-- !query 12
    +SELECT *
    +FROM   t1
    +WHERE  t1a NOT IN (SELECT t2a
    +                   FROM   t2
    +                   ORDER  BY t2a DESC nulls first)
    +       and t1c IN (SELECT t2c
    +                   FROM   t2
    +                   ORDER  BY t2b DESC nulls last)
    +ORDER  BY t1c DESC nulls last
    +-- !query 12 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 12 output
    +val1d	NULL	16	22	17.0	25.0	2600	2014-06-04 01:01:00	NULL
    +val1d	NULL	16	19	17.0	25.0	2600	2014-07-04 01:02:00.001	NULL
    +val1a	16	12	21	15.0	20.0	2000	2014-06-04 01:02:00.001	2014-06-04
    +val1a	16	12	10	15.0	20.0	2000	2014-07-04 01:01:00	2014-07-04
    +
    +
    +-- !query 13
    +SELECT *
    +FROM   t1
    +WHERE  t1b IN (SELECT Min(t2b)
    +               FROM   t2
    +               GROUP  BY t2a
    +               ORDER  BY t2a DESC)
    +-- !query 13 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 13 output
    +val1a	6	8	10	15.0	20.0	2000	2014-04-04 01:00:00	2014-04-04
    +val1a	6	8	10	15.0	20.0	2000	2014-04-04 01:02:00.001	2014-04-04
    +val1b	8	16	19	17.0	25.0	2600	2014-05-04 01:01:00	2014-05-04
    +val1c	8	16	19	17.0	25.0	2600	2014-05-04 01:02:00.001	2014-05-05
    +
    +
    +-- !query 14
    +SELECT t1a,
    +       Count(DISTINCT( t1b ))
    +FROM   t1
    +WHERE  t1b IN (SELECT Min(t2b)
    +               FROM   t2
    +               WHERE  t1a = t2a
    +               GROUP  BY t2a
    +               ORDER  BY t2a)
    +GROUP  BY t1a,
    +          t1h
    +ORDER BY t1a
    +-- !query 14 schema
    +struct<t1a:string,count(DISTINCT t1b):bigint>
    +-- !query 14 output
    +val1b	1
    +
    +
    +-- !query 15
    +SELECT *
    +FROM   t1
    +WHERE  t1b NOT IN (SELECT Min(t2b)
    +                   FROM   t2
    +                   GROUP  BY t2a
    +                   ORDER  BY t2a)
    +-- !query 15 schema
    +struct<t1a:string,t1b:smallint,t1c:int,t1d:bigint,t1e:float,t1f:double,t1g:decimal(2,-2),t1h:timestamp,t1i:date>
    +-- !query 15 output
    +val1a	16	12	10	15.0	20.0	2000	2014-07-04 01:01:00	2014-07-04
    +val1a	16	12	21	15.0	20.0	2000	2014-06-04 01:02:00.001	2014-06-04
    +val1d	10	NULL	12	17.0	25.0	2600	2015-05-04 01:01:00	2015-05-04
    +val1e	10	NULL	19	17.0	25.0	2600	2014-05-04 01:01:00	2014-05-04
    +val1e	10	NULL	19	17.0	25.0	2600	2014-09-04 01:02:00.001	2014-09-04
    +val1e	10	NULL	25	17.0	25.0	2600	2014-08-04 01:01:00	2014-08-04
    +
    +
    +-- !query 16
    +SELECT t1a,
    +       Sum(DISTINCT( t1b ))
    +FROM   t1
    +WHERE  t1b NOT IN (SELECT Min(t2b)
    +                   FROM   t2
    +                   WHERE  t1a = t2a
    +                   GROUP  BY t2c
    +                   ORDER  BY t2c DESC nulls last)
    +GROUP  BY t1a
    +-- !query 16 schema
    +struct<t1a:string,sum(DISTINCT t1b):bigint>
    +-- !query 16 output
    +val1a	22
    +val1c	8
    +val1d	10
    +val1e	10
    +
    +
    +-- !query 17
    +SELECT Count(DISTINCT( t1a )),
    +       t1b
    +FROM   t1
    +WHERE  t1h NOT IN (SELECT t2h
    +                   FROM   t2
    +                   where t1a = t2a
    +                   order by t2d DESC nulls first
    +                   )
    +GROUP  BY t1a,
    +          t1b
    +ORDER  BY t1b DESC nulls last
    +-- !query 17 schema
    +struct<count(DISTINCT t1a):bigint,t1b:smallint>
    +-- !query 17 output
    +1	16
    +1	10
    +1	10
    +1	8
    +1	6
    +1	NULL
    --- End diff --
    
    I have compared the result set matched with the result from DB2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16759: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16759
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org