You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2017/01/26 17:30:57 UTC

[GitHub] spark pull request #16710: [SPARK-18872] New test cases for EXISTS subquery

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/16710

    [SPARK-18872] New test cases for EXISTS subquery

    ## What changes were proposed in this pull request?
    This PR adds the first set of tests for EXISTS subquery.  
    
    - exists-basic.sql => Basic EXISTS and NOT EXISTS subqueries using correlated predicates.
    - exists-within-and-or => Exists subqueries within AND or OR expressions (ExistenceJoin)
    
    DB2 results are attached here as reference :
    
    [exists-basic-db2.txt](https://github.com/apache/spark/files/733031/exists-basic-db2.txt)
    [exists-and-or-db2.txt](https://github.com/apache/spark/files/733030/exists-and-or-db2.txt)
    
    
    ## How was this patch tested?
    This patch is adding tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark exist-basic

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16710.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16710
    
----
commit 53247fe72cb16701ea18fa33db5df35c41f125e8
Author: Dilip Biswal <db...@us.ibm.com>
Date:   2017-01-26T08:59:34Z

    [SPARK-18872] New test cases for EXISTS subquery

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Thank you very much @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16710: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16710


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72121/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72125/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72029/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    cc @hvanhovell @gatorsmile @nsyca 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16710: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16710#discussion_r98346290
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/exists-subquery/exists-basic.sql ---
    @@ -0,0 +1,115 @@
    +-- Tests EXISTS subquery support. Tests basic form 
    +-- of EXISTS subquery (both EXISTS and NOT EXISTS)
    +
    +CREATE TEMPORARY VIEW EMP AS SELECT * FROM VALUES
    +  (100, "emp 1", date "2005-01-01", 100.00D, 10),
    +  (100, "emp 1", date "2005-01-01", 100.00D, 10),
    +  (200, "emp 2", date "2003-01-01", 200.00D, 10),
    +  (300, "emp 3", date "2002-01-01", 300.00D, 20),
    +  (400, "emp 4", date "2005-01-01", 400.00D, 30),
    +  (500, "emp 5", date "2001-01-01", 400.00D, NULL),
    +  (600, "emp 6 - no dept", date "2001-01-01", 400.00D, 100),
    +  (700, "emp 7", date "2010-01-01", 400.00D, 100),
    +  (800, "emp 8", date "2016-01-01", 150.00D, 70)
    +AS EMP(id, emp_name, hiredate, salary, dept_id);
    +
    +CREATE TEMPORARY VIEW DEPT AS SELECT * FROM VALUES
    +  (10, "dept 1", "CA"),
    +  (20, "dept 2", "NY"),
    +  (30, "dept 3", "TX"),
    +  (40, "dept 4 - unassigned", "OR"),
    +  (50, "dept 5 - unassigned", "NJ"),
    +  (70, "dept 7", "FL")
    +AS DEPT(dept_id, dept_name, state);
    +
    +CREATE TEMPORARY VIEW BONUS AS SELECT * FROM VALUES
    +  ("emp 1", 10.00D),
    +  ("emp 1", 20.00D),
    +  ("emp 2", 300.00D),
    +  ("emp 2", 100.00D),
    +  ("emp 3", 300.00D),
    +  ("emp 4", 100.00D),
    +  ("emp 5", 1000.00D),
    +  ("emp 6 - no dept", 500.00D)
    +AS BONUS(emp_name, bonus_amt);
    +
    +-- uncorrelated exist query 
    +-- TC.01.01
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT 1 
    +               FROM   dept 
    +               WHERE  dept.dept_id > 10 
    +                      AND dept.dept_id < 30); 
    +
    +-- simple correlated predicate in exist subquery
    +-- TC.01.02
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id); 
    +
    +-- correlated outer isnull predicate
    +-- TC.01.03
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id 
    +                       OR emp.dept_id IS NULL);
    +
    +-- Simple correlation with a local predicate in outer query
    +-- TC.01.04
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id) 
    +       AND emp.id > 200; 
    +
    +-- Outer references (emp.id) should not be pruned from outer plan
    +-- TC.01.05
    +SELECT emp.emp_name 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.state 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id) 
    +       AND emp.id > 200;
    +
    +-- not exists with correlated predicate
    +-- TC.01.06
    +SELECT * 
    +FROM   dept 
    +WHERE  NOT EXISTS (SELECT emp_name 
    +                   FROM   emp 
    +                   WHERE  emp.dept_id = dept.dept_id);
    +
    +-- not exists with correlated predicate + local predicate
    +-- TC.01.07
    +SELECT * 
    +FROM   dept 
    +WHERE  NOT EXISTS (SELECT emp_name 
    +                   FROM   emp 
    +                   WHERE  emp.dept_id = dept.dept_id 
    +                           OR state = 'NJ');
    +
    +-- not exist both equal and greaterthan predicate
    +-- TC.01.08
    +SELECT * 
    +FROM   bonus 
    +WHERE  NOT EXISTS (SELECT * 
    +                   FROM   emp 
    +                   WHERE  emp.emp_name = emp_name 
    +                          AND bonus_amt > emp.salary); 
    --- End diff --
    
    @gatorsmile Thanks .. I have added the suggested test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    **[Test build #72125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72125/testReport)** for PR 16710 at commit [`0e93a64`](https://github.com/apache/spark/commit/0e93a64212847b54f199044cd4210d5bf8a5ce91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    **[Test build #72029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72029/testReport)** for PR 16710 at commit [`53247fe`](https://github.com/apache/spark/commit/53247fe72cb16701ea18fa33db5df35c41f125e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    **[Test build #72029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72029/testReport)** for PR 16710 at commit [`53247fe`](https://github.com/apache/spark/commit/53247fe72cb16701ea18fa33db5df35c41f125e8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16710: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

Posted by nsyca <gi...@git.apache.org>.
Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16710#discussion_r98048551
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-within-and-or.sql.out ---
    @@ -0,0 +1,156 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 8
    +
    +
    +-- !query 0
    +CREATE TEMPORARY VIEW EMP AS SELECT * FROM VALUES
    +  (100, "emp 1", date "2005-01-01", 100.00D, 10),
    +  (100, "emp 1", date "2005-01-01", 100.00D, 10),
    +  (200, "emp 2", date "2003-01-01", 200.00D, 10),
    +  (300, "emp 3", date "2002-01-01", 300.00D, 20),
    +  (400, "emp 4", date "2005-01-01", 400.00D, 30),
    +  (500, "emp 5", date "2001-01-01", 400.00D, NULL),
    +  (600, "emp 6 - no dept", date "2001-01-01", 400.00D, 100),
    +  (700, "emp 7", date "2010-01-01", 400.00D, 100),
    +  (800, "emp 8", date "2016-01-01", 150.00D, 70)
    +AS EMP(id, emp_name, hiredate, salary, dept_id)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +CREATE TEMPORARY VIEW DEPT AS SELECT * FROM VALUES
    +  (10, "dept 1", "CA"),
    +  (20, "dept 2", "NY"),
    +  (30, "dept 3", "TX"),
    +  (40, "dept 4 - unassigned", "OR"),
    +  (50, "dept 5 - unassigned", "NJ"),
    +  (70, "dept 7", "FL")
    +AS DEPT(dept_id, dept_name, state)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +CREATE TEMPORARY VIEW BONUS AS SELECT * FROM VALUES
    +  ("emp 1", 10.00D),
    +  ("emp 1", 20.00D),
    +  ("emp 2", 300.00D),
    +  ("emp 2", 100.00D),
    +  ("emp 3", 300.00D),
    +  ("emp 4", 100.00D),
    +  ("emp 5", 1000.00D),
    +  ("emp 6 - no dept", 500.00D)
    +AS BONUS(emp_name, bonus_amt)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +SELECT emp.emp_name 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.state 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id) 
    +        OR emp.id > 200
    +-- !query 3 schema
    +struct<emp_name:string>
    +-- !query 3 output
    +emp 1
    +emp 1
    +emp 2
    +emp 3
    +emp 4
    +emp 5
    +emp 6 - no dept
    +emp 7
    +emp 8
    +
    +
    +-- !query 4
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id) 
    +        OR emp.dept_id IS NULL
    +-- !query 4 schema
    +struct<id:int,emp_name:string,hiredate:date,salary:double,dept_id:int>
    +-- !query 4 output
    +100	emp 1	2005-01-01	100.0	10
    +100	emp 1	2005-01-01	100.0	10
    +200	emp 2	2003-01-01	200.0	10
    +300	emp 3	2002-01-01	300.0	20
    +400	emp 4	2005-01-01	400.0	30
    +500	emp 5	2001-01-01	400.0	NULL
    +800	emp 8	2016-01-01	150.0	70
    +
    +
    +-- !query 5
    +SELECT emp.emp_name 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.state 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id 
    +                      AND dept.dept_id = 20) 
    +        OR EXISTS (SELECT dept.state 
    +                   FROM   dept 
    +                   WHERE  emp.dept_id = dept.dept_id 
    +                          AND dept.dept_id = 30)
    +-- !query 5 schema
    +struct<emp_name:string>
    +-- !query 5 output
    +emp 3
    +emp 4
    +
    +
    +-- !query 6
    +SELECT * 
    +FROM   bonus 
    +WHERE  ( NOT EXISTS (SELECT * 
    +                     FROM   emp 
    +                     WHERE  emp.emp_name = emp_name 
    +                            AND bonus_amt > emp.salary) 
    +          OR EXISTS (SELECT * 
    +                     FROM   emp 
    +                     WHERE  emp.emp_name = emp_name 
    +                             OR bonus_amt < emp.salary) )
    +-- !query 6 schema
    +struct<emp_name:string,bonus_amt:double>
    +-- !query 6 output
    +emp 1	10.0
    +emp 1	20.0
    +emp 2	100.0
    +emp 2	300.0
    +emp 3	300.0
    +emp 4	100.0
    +emp 5	1000.0
    +emp 6 - no dept	500.0
    +
    +
    +-- !query 7
    +SELECT * FROM bonus WHERE NOT EXISTS 
    +( 
    +       SELECT * 
    +       FROM   emp 
    +       WHERE  emp.emp_name = emp_name 
    +       AND    bonus_amt > emp.salary) 
    +AND 
    +emp_name IN 
    +( 
    +       SELECT emp_name 
    +       FROM   emp 
    +       WHERE  bonus_amt < emp.salary)
    +-- !query 7 schema
    +struct<emp_name:string,bonus_amt:double>
    +-- !query 7 output
    +emp 1	10.0
    +emp 1	20.0
    +emp 2	100.0
    +emp 4	100.0
    --- End diff --
    
    I have compared this result file with the attached DB2 output. The two sets of result are equivalent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    **[Test build #72121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72121/testReport)** for PR 16710 at commit [`0e93a64`](https://github.com/apache/spark/commit/0e93a64212847b54f199044cd4210d5bf8a5ce91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16710: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

Posted by nsyca <gi...@git.apache.org>.
Github user nsyca commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16710#discussion_r98048517
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-basic.sql.out ---
    @@ -0,0 +1,201 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 12
    +
    +
    +-- !query 0
    +CREATE TEMPORARY VIEW EMP AS SELECT * FROM VALUES
    +  (100, "emp 1", date "2005-01-01", 100.00D, 10),
    +  (100, "emp 1", date "2005-01-01", 100.00D, 10),
    +  (200, "emp 2", date "2003-01-01", 200.00D, 10),
    +  (300, "emp 3", date "2002-01-01", 300.00D, 20),
    +  (400, "emp 4", date "2005-01-01", 400.00D, 30),
    +  (500, "emp 5", date "2001-01-01", 400.00D, NULL),
    +  (600, "emp 6 - no dept", date "2001-01-01", 400.00D, 100),
    +  (700, "emp 7", date "2010-01-01", 400.00D, 100),
    +  (800, "emp 8", date "2016-01-01", 150.00D, 70)
    +AS EMP(id, emp_name, hiredate, salary, dept_id)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +CREATE TEMPORARY VIEW DEPT AS SELECT * FROM VALUES
    +  (10, "dept 1", "CA"),
    +  (20, "dept 2", "NY"),
    +  (30, "dept 3", "TX"),
    +  (40, "dept 4 - unassigned", "OR"),
    +  (50, "dept 5 - unassigned", "NJ"),
    +  (70, "dept 7", "FL")
    +AS DEPT(dept_id, dept_name, state)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +CREATE TEMPORARY VIEW BONUS AS SELECT * FROM VALUES
    +  ("emp 1", 10.00D),
    +  ("emp 1", 20.00D),
    +  ("emp 2", 300.00D),
    +  ("emp 2", 100.00D),
    +  ("emp 3", 300.00D),
    +  ("emp 4", 100.00D),
    +  ("emp 5", 1000.00D),
    +  ("emp 6 - no dept", 500.00D)
    +AS BONUS(emp_name, bonus_amt)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT 1 
    +               FROM   dept 
    +               WHERE  dept.dept_id > 10 
    +                      AND dept.dept_id < 30)
    +-- !query 3 schema
    +struct<id:int,emp_name:string,hiredate:date,salary:double,dept_id:int>
    +-- !query 3 output
    +100	emp 1	2005-01-01	100.0	10
    +100	emp 1	2005-01-01	100.0	10
    +200	emp 2	2003-01-01	200.0	10
    +300	emp 3	2002-01-01	300.0	20
    +400	emp 4	2005-01-01	400.0	30
    +500	emp 5	2001-01-01	400.0	NULL
    +600	emp 6 - no dept	2001-01-01	400.0	100
    +700	emp 7	2010-01-01	400.0	100
    +800	emp 8	2016-01-01	150.0	70
    +
    +
    +-- !query 4
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id)
    +-- !query 4 schema
    +struct<id:int,emp_name:string,hiredate:date,salary:double,dept_id:int>
    +-- !query 4 output
    +100	emp 1	2005-01-01	100.0	10
    +100	emp 1	2005-01-01	100.0	10
    +200	emp 2	2003-01-01	200.0	10
    +300	emp 3	2002-01-01	300.0	20
    +400	emp 4	2005-01-01	400.0	30
    +800	emp 8	2016-01-01	150.0	70
    +
    +
    +-- !query 5
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id 
    +                       OR emp.dept_id IS NULL)
    +-- !query 5 schema
    +struct<id:int,emp_name:string,hiredate:date,salary:double,dept_id:int>
    +-- !query 5 output
    +100	emp 1	2005-01-01	100.0	10
    +100	emp 1	2005-01-01	100.0	10
    +200	emp 2	2003-01-01	200.0	10
    +300	emp 3	2002-01-01	300.0	20
    +400	emp 4	2005-01-01	400.0	30
    +500	emp 5	2001-01-01	400.0	NULL
    +800	emp 8	2016-01-01	150.0	70
    +
    +
    +-- !query 6
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id) 
    +       AND emp.id > 200
    +-- !query 6 schema
    +struct<id:int,emp_name:string,hiredate:date,salary:double,dept_id:int>
    +-- !query 6 output
    +300	emp 3	2002-01-01	300.0	20
    +400	emp 4	2005-01-01	400.0	30
    +800	emp 8	2016-01-01	150.0	70
    +
    +
    +-- !query 7
    +SELECT emp.emp_name 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.state 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id) 
    +       AND emp.id > 200
    +-- !query 7 schema
    +struct<emp_name:string>
    +-- !query 7 output
    +emp 3
    +emp 4
    +emp 8
    +
    +
    +-- !query 8
    +SELECT * 
    +FROM   dept 
    +WHERE  NOT EXISTS (SELECT emp_name 
    +                   FROM   emp 
    +                   WHERE  emp.dept_id = dept.dept_id)
    +-- !query 8 schema
    +struct<dept_id:int,dept_name:string,state:string>
    +-- !query 8 output
    +40	dept 4 - unassigned	OR
    +50	dept 5 - unassigned	NJ
    +
    +
    +-- !query 9
    +SELECT * 
    +FROM   dept 
    +WHERE  NOT EXISTS (SELECT emp_name 
    +                   FROM   emp 
    +                   WHERE  emp.dept_id = dept.dept_id 
    +                           OR state = 'NJ')
    +-- !query 9 schema
    +struct<dept_id:int,dept_name:string,state:string>
    +-- !query 9 output
    +40	dept 4 - unassigned	OR
    +
    +
    +-- !query 10
    +SELECT * 
    +FROM   bonus 
    +WHERE  NOT EXISTS (SELECT * 
    +                   FROM   emp 
    +                   WHERE  emp.emp_name = emp_name 
    +                          AND bonus_amt > emp.salary)
    +-- !query 10 schema
    +struct<emp_name:string,bonus_amt:double>
    +-- !query 10 output
    +emp 1	10.0
    +emp 1	20.0
    +emp 2	100.0
    +emp 4	100.0
    +
    +
    +-- !query 11
    +SELECT * 
    +FROM   bonus 
    +WHERE  EXISTS (SELECT emp_name 
    +               FROM   emp 
    +               WHERE  bonus.emp_name = emp.emp_name 
    +                      AND EXISTS (SELECT state 
    +                                  FROM   dept 
    +                                  WHERE  dept.dept_id = emp.dept_id))
    +-- !query 11 schema
    +struct<emp_name:string,bonus_amt:double>
    +-- !query 11 output
    +emp 1	10.0
    +emp 1	20.0
    +emp 2	100.0
    +emp 2	300.0
    +emp 3	300.0
    +emp 4	100.0
    --- End diff --
    
    I have compared this result file with the attached DB2 output. The two sets of result are equivalent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    **[Test build #72125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72125/testReport)** for PR 16710 at commit [`0e93a64`](https://github.com/apache/spark/commit/0e93a64212847b54f199044cd4210d5bf8a5ce91).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16710: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16710#discussion_r98345474
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/exists-subquery/exists-basic.sql ---
    @@ -0,0 +1,115 @@
    +-- Tests EXISTS subquery support. Tests basic form 
    +-- of EXISTS subquery (both EXISTS and NOT EXISTS)
    +
    +CREATE TEMPORARY VIEW EMP AS SELECT * FROM VALUES
    +  (100, "emp 1", date "2005-01-01", 100.00D, 10),
    +  (100, "emp 1", date "2005-01-01", 100.00D, 10),
    +  (200, "emp 2", date "2003-01-01", 200.00D, 10),
    +  (300, "emp 3", date "2002-01-01", 300.00D, 20),
    +  (400, "emp 4", date "2005-01-01", 400.00D, 30),
    +  (500, "emp 5", date "2001-01-01", 400.00D, NULL),
    +  (600, "emp 6 - no dept", date "2001-01-01", 400.00D, 100),
    +  (700, "emp 7", date "2010-01-01", 400.00D, 100),
    +  (800, "emp 8", date "2016-01-01", 150.00D, 70)
    +AS EMP(id, emp_name, hiredate, salary, dept_id);
    +
    +CREATE TEMPORARY VIEW DEPT AS SELECT * FROM VALUES
    +  (10, "dept 1", "CA"),
    +  (20, "dept 2", "NY"),
    +  (30, "dept 3", "TX"),
    +  (40, "dept 4 - unassigned", "OR"),
    +  (50, "dept 5 - unassigned", "NJ"),
    +  (70, "dept 7", "FL")
    +AS DEPT(dept_id, dept_name, state);
    +
    +CREATE TEMPORARY VIEW BONUS AS SELECT * FROM VALUES
    +  ("emp 1", 10.00D),
    +  ("emp 1", 20.00D),
    +  ("emp 2", 300.00D),
    +  ("emp 2", 100.00D),
    +  ("emp 3", 300.00D),
    +  ("emp 4", 100.00D),
    +  ("emp 5", 1000.00D),
    +  ("emp 6 - no dept", 500.00D)
    +AS BONUS(emp_name, bonus_amt);
    +
    +-- uncorrelated exist query 
    +-- TC.01.01
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT 1 
    +               FROM   dept 
    +               WHERE  dept.dept_id > 10 
    +                      AND dept.dept_id < 30); 
    +
    +-- simple correlated predicate in exist subquery
    +-- TC.01.02
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id); 
    +
    +-- correlated outer isnull predicate
    +-- TC.01.03
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id 
    +                       OR emp.dept_id IS NULL);
    +
    +-- Simple correlation with a local predicate in outer query
    +-- TC.01.04
    +SELECT * 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.dept_name 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id) 
    +       AND emp.id > 200; 
    +
    +-- Outer references (emp.id) should not be pruned from outer plan
    +-- TC.01.05
    +SELECT emp.emp_name 
    +FROM   emp 
    +WHERE  EXISTS (SELECT dept.state 
    +               FROM   dept 
    +               WHERE  emp.dept_id = dept.dept_id) 
    +       AND emp.id > 200;
    +
    +-- not exists with correlated predicate
    +-- TC.01.06
    +SELECT * 
    +FROM   dept 
    +WHERE  NOT EXISTS (SELECT emp_name 
    +                   FROM   emp 
    +                   WHERE  emp.dept_id = dept.dept_id);
    +
    +-- not exists with correlated predicate + local predicate
    +-- TC.01.07
    +SELECT * 
    +FROM   dept 
    +WHERE  NOT EXISTS (SELECT emp_name 
    +                   FROM   emp 
    +                   WHERE  emp.dept_id = dept.dept_id 
    +                           OR state = 'NJ');
    +
    +-- not exist both equal and greaterthan predicate
    +-- TC.01.08
    +SELECT * 
    +FROM   bonus 
    +WHERE  NOT EXISTS (SELECT * 
    +                   FROM   emp 
    +                   WHERE  emp.emp_name = emp_name 
    +                          AND bonus_amt > emp.salary); 
    --- End diff --
    
    Could you also add one more test case, which is very like `TC.01.08`? This test case is used very often in the real world. BTW, you do not need to run DB2. We can know the results using our naked eyes. : ) 
    ```SQL
    -- select values present in one table but missing in another one
    
    SELECT emp.*
    FROM   emp
    WHERE  NOT EXISTS (SELECT NULL
                       FROM   bonus
                       WHERE  bonus.emp_name = emp.emp_name);
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16710: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16710
  
    Could you update the PR descriptions? Currently, it does not correctly describe the cases you added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org