You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/05/22 22:19:03 UTC

[GitHub] spark pull request #21403: [SPARK-24313][WIP][SQL] Support IN subqueries wit...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/21403

    [SPARK-24313][WIP][SQL] Support IN subqueries with struct type

    ## What changes were proposed in this pull request?
    
    Using struct types in subqueries with the `IN` clause can generate invalid plans in `RewritePredicateSubquery`. Indeed, we do not support the cases when the outer value is a struct or the output of the inner subquery is a struct. We only support the case when the output value is a simple value or a struct created in that place and the output of the subquery is a simple value or a list of values. 
    
    ## How was this patch tested?
    
    Added UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-24313

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21403
    
----
commit 5b6226fa48d82b461c6d5a8d1a9a625d2617af76
Author: Marco Gaido <ma...@...>
Date:   2018-05-22T22:08:22Z

    [SPARK-24313][SQL] Support IN subqueries with struct type

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198911203
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    I am not sure which Postgres version you're using but that seems a bug. I am using version 10 and it works as expected, ie.:
    
    ```
    mgaido=# select 1 from (select (1, 'a') as col1) tab1 where col1 = (1, 'a');
    ERROR:  could not identify an equality operator for type unknown
    mgaido=# select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) = (1, 'a');
     ?column? 
    ----------
            1
    (1 row)
    ```
    
    Anyway, in Presto, both work:
    
    ```
    presto> select 1 from (select (1, 'a') as col1) tab1 where col1 in ((1, 'a'));
     _col0 
    -------
         1 
    (1 row)
    
    Query 20180628_163550_00000_gkbmf, FINISHED, 1 node
    Splits: 17 total, 17 done (100.00%)
    0:01 [0 rows, 0B] [0 rows/s, 0B/s]
    
    presto> select 1 from (select (1, 'a') as col1) tab1 where (1, 'a') in ((1, 'a'));
     _col0 
    -------
         1 
    (1 row)
    Query 20180628_163600_00001_gkbmf, FINISHED, 1 node
    Splits: 17 total, 17 done (100.00%)
    0:00 [0 rows, 0B] [0 rows/s, 0B/s]
    
    presto> select 1 from (select (1, 'a') as col1) tab1 where col1 in (select (1, 'a'));
     _col0 
    -------
         1 
    (1 row)
    
    Query 20180628_163607_00002_gkbmf, FINISHED, 1 node
    Splits: 83 total, 83 done (100.00%)
    0:01 [0 rows, 0B] [0 rows/s, 0B/s]
    
    presto> select 1 from (select (1, 'a') as col1) tab1 where (1, 'a') in (select (1, 'a'));
     _col0 
    -------
         1 
    (1 row)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198826199
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    can we have an analyzer rule to deal with `In(CreateStruct(...), ListQuery(...))`, to unpack the `CreateStruct`, or pack the `ListQuery`? Then we don't need to change `In`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94292/testReport)** for PR 21403 at commit [`eb1dfb7`](https://github.com/apache/spark/commit/eb1dfb7e0873b8479ea54d223b7fde3dcefa4834).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92085/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1793/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @mgaido91 I see. But by using Seq[Expression] in `In`, can we hopefully remove `ResolveInValues`. I wouldn't mind changing the parser if it's necessary and if it saves work elsewhere. Having such a temporary expression which doesn't mean anything more than a wrapper of Seq[Expression] doesn't look like a very clean approach to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1356/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93497/testReport)** for PR 21403 at commit [`c0ad1e3`](https://github.com/apache/spark/commit/c0ad1e38251df7882b3a27902b21dc3717d34697).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93496 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93496/testReport)** for PR 21403 at commit [`7c898a5`](https://github.com/apache/spark/commit/7c898a5d7fe188e8b617955e562a2aaf84fa7fdd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    I am encountering big issues in enforcing the behavior we mentioned. The problem is that we cannot really distinguish the cases:
     - `... (a, b) in (select ...)`
     - `... from (select (a, b) as x ...) where x in (select ...)`
    
    So the problem is that we don't know if we have a `CreateNamedStruct` exactly there or if the `Optimizer` puts it somehow there. It may be needed to change the parsing logic for this and to revisit the whole `In` structure. I mean, we have to parse not a single value but a list of values in the outer operator.
    
    And as well we cannot really distinguish:
     - `... (a, b) in (select (a, b) from ...)`
     - `... (a, b) in (select a from (select (a, b) from ...))`
    
    This is a bit trickier indeed.
    
    What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    LGTM, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24313][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    `case class In(value: Expression, list: Seq[Expression])` is an internal class. We do not expect user applications rely on the internal data structure. If they did, they should understand the class will not be stable. 
    
    The suggestion of @maryannxue is better, IMO.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198865129
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    I mean `case class InSubquery(values: Seq[Expression], subquery: ListSubquery)`, it's not just the left part.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/648/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21403


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1852/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #92581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92581/testReport)** for PR 21403 at commit [`d3e39ed`](https://github.com/apache/spark/commit/d3e39ed3f442958cfaaa1ef056cb72fedf0fce1c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93543/testReport)** for PR 21403 at commit [`bd008fe`](https://github.com/apache/spark/commit/bd008fe51f70f9925e9513680636f4dd9aadcd7c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @mgaido91 This also works, +1.
    What about `a in (select (b, c) from ...)` when `a` is a struct? - I guess allow it, but a potential gotcha during implementation


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #92085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92085/testReport)** for PR 21403 at commit [`c9a36e0`](https://github.com/apache/spark/commit/c9a36e0fb6301b6807ea6ca9e3415e899f7a83ac).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24313][WIP][SQL] Support IN subqueries wit...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r190160896
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2261,6 +2261,24 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec])
       }
     
    +  test("SPARK-24341: IN subqueries with struct fields") {
    --- End diff --
    
    Can we just add these tests to the SqlQueryTestSuite. This is where most of the subquery tests can be found.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94202/testReport)** for PR 21403 at commit [`a6114a6`](https://github.com/apache/spark/commit/a6114a655305f318230bf1bbd25394e952793a94).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1273/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94001/testReport)** for PR 21403 at commit [`53e3d96`](https://github.com/apache/spark/commit/53e3d961a0cde6d6ab6b4c8b86b9134b9532f776).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93783/testReport)** for PR 21403 at commit [`423e93e`](https://github.com/apache/spark/commit/423e93efffa523cb44246218773c864cbb946059).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait BarrierTaskContext extends TaskContext `
      * `class BarrierTaskInfo(val address: String)`
      * `class RDDBarrier[T: ClassTag](rdd: RDD[T]) `
      * `case class WorkerOffer(`
      * `case class Shuffle(child: Expression, randomSeed: Option[Long] = None)`
      * `case class ReplicateRows(children: Seq[Expression]) extends Generator with CodegenFallback `
      * `trait AnalysisHelper extends QueryPlan[LogicalPlan] `
      * `case class Intersect(`
      * `case class Except(`
      * `case class RandomIndicesGenerator(randomSeed: Long) `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @mgaido91 BTW: In SPARK-24395 I would consider the cases to still be valid, because I believe there is no other syntactic way to do a multi-column IN/NOT IN with list of literals.
    The question is whether it should be treated as structs, or unpacked?
    If like structs, then the current behavior is correct, I think.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    thanks @hvanhovell, sorry for the error. I changed to the right one.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    I just realized there are 2 `InSubquery` expressions, seems we need to rename one of it. @mgaido91 any ideas?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r199116440
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    SGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    I'm writing release notes, and this one gets my attention. @mgaido91 can you confirm that this patch doesn't introduce any behavior change? i.e. if it fails previously, it still fails. If it successes previously, it still successes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #92524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92524/testReport)** for PR 21403 at commit [`268307f`](https://github.com/apache/spark/commit/268307f52248d6408862cc76ccb54612ef9ef216).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class InValues(children: Seq[Expression]) extends Expression `
      * `case class In(value: Expression, list: Seq[Expression]) extends Predicate `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1836/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1754/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r199086016
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    I am not sure. The behavior when comparing structs in not uniform among different DBs. Hive doesn't allow `=` on structs. Postgres and Presto does, but their behavior with nulls is not consistent and it is different from ours. In particular, comparing a struct containing a `null` returns `null` on Postgres and causes an exception in Presto (we return `false` instead). This is causing another problem which has been reported in another JIRA for which we can return results different from Postgres and Oracle (SPARK-24395).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r207701506
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -505,6 +505,7 @@ object NullPropagation extends Rule[LogicalPlan] {
     
           // If the value expression is NULL then transform the In expression to null literal.
           case In(Literal(null, _), _) => Literal.create(null, BooleanType)
    +      case InSubquery(Seq(Literal(null, _)), _) => Literal.create(null, BooleanType)
    --- End diff --
    
    Thanks for adding this! Please double check all the cases of `IN` in all the optimizer rules. We are afraid this new expression might introduce a regression. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93539/testReport)** for PR 21403 at commit [`0412829`](https://github.com/apache/spark/commit/04128292e6d145ec608166b532c960cac72a500c).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1765/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r199080189
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    Anyway, I think right behavior is the one which both Postgres and Hive have (and it is also the same of Oracle/MySQL, in which we don't have structs). What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93498/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1843/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93106/testReport)** for PR 21403 at commit [`a5771b8`](https://github.com/apache/spark/commit/a5771b8a0a4f00d95bb6f882f40ccccaa6dd17d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    I think that the way the columns are defined in the subquery should define the semantics.
    E.g.:
    `(a, b) IN (select c, d from ...)` - unpack (a, b) and treat it as a multi column comparison as in current semantics.
    `(a, b) IN (select (c, d) from ..)` - keep it packed and treat it as a single column IN.
    `(a, b, c) IN (select (d, e), f from ..)` or similar combinations - catch it in analysis as ambiguous
    `(a, b, c) IN (select (d, e), f, g from ..)` - but this is valid as long as `a` matches the type of `(d, e)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #92085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92085/testReport)** for PR 21403 at commit [`c9a36e0`](https://github.com/apache/spark/commit/c9a36e0fb6301b6807ea6ca9e3415e899f7a83ac).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][WIP][SQL] Support IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r190217120
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -45,6 +46,10 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
       private def getValueExpression(e: Expression): Seq[Expression] = {
         e match {
           case cns : CreateNamedStruct => cns.valExprs
    +      case Literal(struct: InternalRow, dt: StructType) if dt.isInstanceOf[StructType] =>
    +        dt.zipWithIndex.map { case (field, idx) => Literal(struct.get(idx, field.dataType)) }
    +      case a @ AttributeReference(_, dt: StructType, _, _) =>
    --- End diff --
    
    I see. Then I think that the example reported in the JIRA should be considered an invalid query, since the number of elements of the outside value is different from the one inside the query. So we should throw an AnalysisException for that case. Do you agree with this approach?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r197743862
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2261,6 +2261,24 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec])
       }
     
    +  test("SPARK-24341: IN subqueries with struct fields") {
    --- End diff --
    
    I cannot really add them there since I need to intercept `AnalysisException` here, but if you have suggestions about better places for this, I am happy to move it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198904048
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    maybe we should also try Hive and Presto (they may not have `(...)`, but `struct(...)`)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93670/testReport)** for PR 21403 at commit [`571b273`](https://github.com/apache/spark/commit/571b2733a229d2271472cf60ede2f9072d437256).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94129/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24313][WIP][SQL] Support IN subqueries wit...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r190160677
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -45,6 +46,10 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
       private def getValueExpression(e: Expression): Seq[Expression] = {
         e match {
           case cns : CreateNamedStruct => cns.valExprs
    +      case Literal(struct: InternalRow, dt: StructType) if dt.isInstanceOf[StructType] =>
    +        dt.zipWithIndex.map { case (field, idx) => Literal(struct.get(idx, field.dataType)) }
    +      case a @ AttributeReference(_, dt: StructType, _, _) =>
    --- End diff --
    
    I am not sure if we should unpack the struct and do a field by field comparison. The reason for this is that the field by field comparison can yield a `null` value, and the struct level comparison cannot. This matters a lot for null aware anti joins.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r206304130
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1422,11 +1422,26 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if value.resolved && !l.resolved =>
    +        case In(values, Seq(l @ ListQuery(_, _, exprId, _)))
    +            if values.forall(_.resolved) && !l.resolved =>
               val expr = resolveSubQuery(l, plans)((plan, exprs) => {
                 ListQuery(plan, exprs, exprId, plan.output)
               })
    -          In(value, Seq(expr))
    +          val subqueryOutput = expr.plan.output
    +          val resolvedIn = In(values, Seq(expr))
    +          if (values.length != subqueryOutput.length) {
    +            throw new AnalysisException(
    --- End diff --
    
    @mgaido91 I quickly tried the error case to check out the message - 
    ```
    spark-sql> select * from ut1 where (c1, c2) in (select (c1, c2) from ut2);
    Error in query: Cannot analyze (named_struct('c1', ut1.`c1`, 'c2', ut1.`c2`) IN (listquery())).
    The number of columns in the left hand side of an IN subquery does not match the
    number of columns in the output of subquery.
    #columns in left hand side: 2.
    #columns in right hand side: 1.
    Left side columns:
    [ut1.`c1`, ut1.`c2`].
    Right side columns:
    [`named_struct(c1, c1, c2, c2)`].;
    ```
    The right hand side columns looks confusing. Should we only display the value exprs or the name exprs instead of both ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94156/testReport)** for PR 21403 at commit [`cb3467b`](https://github.com/apache/spark/commit/cb3467be92c1f7c8ed313ff1b37a00f82d59eda6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24313][WIP][SQL] Support IN subqueries with struc...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    cc @cloud-fan @dilipbiswal @gatorsmile @juliuszsompolski: it is currently a WIP since I think the UTs have to be formalized a bit better. But I wanted to share with you this in order to understand if you agree on the strategy of this PR. I'd really appreciate any feedback. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r207701674
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -505,6 +505,7 @@ object NullPropagation extends Rule[LogicalPlan] {
     
           // If the value expression is NULL then transform the In expression to null literal.
           case In(Literal(null, _), _) => Literal.create(null, BooleanType)
    +      case InSubquery(Seq(Literal(null, _)), _) => Literal.create(null, BooleanType)
    --- End diff --
    
    Add a test case in OptimizeInSuite


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198844447
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    it is not a subquery, this is the "left part" of IN, so I don't really agree on `InSubquery`, but if you have another suggestion I am happy to follow it. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    The example of non-sub-query looks weird. If `where col1 in ((1, 'a'))` can't work, what is the right query if we do have a struct type column?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93496/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93599/testReport)** for PR 21403 at commit [`f5fa2c4`](https://github.com/apache/spark/commit/f5fa2c4b99a810c25a02e6d32550135d429c70c2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r199079546
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    oh, sorry, you're right. I got confused with Presto's results. Sorry.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @cloud-fan thank for looking at this. I don't think that "hack" can be removed. Let me show an example when I think we cannot avoid that change.
    
    Imagine this query:
    ```
    select 1 from (select (1, 'a') as col1) tab1 where col1 in (select 1, 'a')
    ```
    Without changing the way `In` is built this is equivalent to:
    ```
    select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) in (select 1, 'a')
    ```
    But the first query is invalid, as the outer value has one element an the subquery has 2 output fields, while the second query is valid. So the only way I found in order to avoid problem like this is changing `In` as done in this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    I agree with the proposed behavior, but I'm a little worried about hacking the existig `In` expression to implement it.
    
    I took a look at postgres, `a = b` is equal to `a.i = b.i and a.j = b.j` if `a` and `b` are both struct type with same number of fields with same type. If we make Spark's equal operator follow this semantic, it can make this PR simpler.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    OK I see the point now.
    
    `a IN (select (c, d) from ...)` is valid but `(i, j) IN (select (c, d) from ...)` is not. This is a little weird because `a` is semantically same with `(i, j)` if `a` is struct type with 2 fields `i` and `j`. It sounds like we want to treat `(...)` specially in the context of IN expression.
    
    Can you give some similar examples in other databases?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93999/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94129/testReport)** for PR 21403 at commit [`45a91fc`](https://github.com/apache/spark/commit/45a91fc4b252967cf99c88331f51a702edadbaa2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94269/testReport)** for PR 21403 at commit [`eb1dfb7`](https://github.com/apache/spark/commit/eb1dfb7e0873b8479ea54d223b7fde3dcefa4834).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93599/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    no, sorry, you're right. In the case when it is not a subquery, it is safe to treat them as the same. I got confused checking the result of other DBs as they are all behaving differently in that scenario. I will try and follow your suggestion about creating a new `InSubquery` expression then, in orer to handle that case where all DBs behave consistently and it is more clear which the behavior should be. Sorry for the mistake. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93508/testReport)** for PR 21403 at commit [`22f77ae`](https://github.com/apache/spark/commit/22f77ae5fff52c4a9c0900c0246b34782cb76652).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93999/testReport)** for PR 21403 at commit [`0f00a06`](https://github.com/apache/spark/commit/0f00a06a1853cb13d1d156bafcb85973c92e2b8e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1657/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4210/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @cloud-fan, no, it introduces a behavior change when structs are involved. The two queries [here](https://github.com/apache/spark/pull/21403/files/eb1dfb7e0873b8479ea54d223b7fde3dcefa4834#diff-b324aa60ed6de5866aebafc3c9b80391R10) would fail before this query, while the version written like this would work (and after the PR doesn't work instead):
    ```
    select count(*) from struct_tab where record in (select a2, b2 from tab_b);
    select count(*) from struct_tab where record not in (select a2, b2 from tab_b);
    ```
    Since before the PR any struct before the IN operator behaves like having `(f1, f2, ...)`, while after the PR a struct there is considered as a field.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @maryannxue that is feasible too and indeed it was the original implementation I did, I switched to this approach according to [this discussion](https://github.com/apache/spark/pull/21403#discussion_r198826199): the goal was to avoid to change the `In` signature and the parsing logic in the many places.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r205214601
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/subq-input-typecheck.sql.out ---
    @@ -113,15 +105,7 @@ WHERE
     struct<>
     -- !query 8 output
     org.apache.spark.sql.AnalysisException
    -cannot resolve '(named_struct('t1a', t1.`t1a`, 't1b', t1.`t1b`) IN (listquery(t1.`t1a`)))' due to data type mismatch: 
    -The number of columns in the left hand side of an IN subquery does not match the
    -number of columns in the output of subquery.
    -#columns in left hand side: 2.
    -#columns in right hand side: 1.
    -Left side columns:
    -[t1.`t1a`, t1.`t1b`].
    -Right side columns:
    -[t2.`t2a`].;
    --- End diff --
    
    Also output the message from line 117 to 124


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93670/testReport)** for PR 21403 at commit [`571b273`](https://github.com/apache/spark/commit/571b2733a229d2271472cf60ede2f9072d437256).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1282/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    I updated the PR according to the previous discussion.
    
    @hvanhovell @juliuszsompolski may you please take a look at it now? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #92084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92084/testReport)** for PR 21403 at commit [`df7d3ee`](https://github.com/apache/spark/commit/df7d3ee600a65a4b5e6c49f13a3f39b8196616c8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93783/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93508/testReport)** for PR 21403 at commit [`22f77ae`](https://github.com/apache/spark/commit/22f77ae5fff52c4a9c0900c0246b34782cb76652).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    cc @maryannxue Review this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    I think the behavior definition is good and clear. But just a question on the implementation: is it necessary to introduce a new class `InValues`? or we could simply make `In` has it's first child "value" as `Seq[Expression]` type?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    anymore comments @cloud-fan @hvanhovell @juliuszsompolski ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r206436489
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1422,11 +1422,26 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if value.resolved && !l.resolved =>
    +        case In(values, Seq(l @ ListQuery(_, _, exprId, _)))
    +            if values.forall(_.resolved) && !l.resolved =>
               val expr = resolveSubQuery(l, plans)((plan, exprs) => {
                 ListQuery(plan, exprs, exprId, plan.output)
               })
    -          In(value, Seq(expr))
    +          val subqueryOutput = expr.plan.output
    +          val resolvedIn = In(values, Seq(expr))
    +          if (values.length != subqueryOutput.length) {
    +            throw new AnalysisException(
    --- End diff --
    
    thanks for your review @dilipbiswal.
    
    > The right hand side columns looks confusing. Should we only display the value exprs or the name exprs instead of both ?
    
    I don't think so honestly. This is just the `sql` value of a named_struct. Having a custom representation of it only for this use case doesn't seem a good idea to me.
    
    > perhaps we can take out the dot at the end of #columns .. and the following lines ?
    
    sure, I'll do in my next commit, thanks.
    
    > We have this check in checkInputDataTypes and here ? Is there a way we can have the number of input check in one place ?
    
    I added the check here in order to avoid to waste time going on in the analysis while we already know that it is going to fail. I haven't removed the check from `checkInputDataTypes`, as I prefer staying on the safe side with an additional check, but that is not met anymore, as we check it here.
    
    > Just a question, should we have been able to do a type promotion here ?
    
    I have not changed type promotion in this PR. The same behavior happens before and after this PR. I think this can be proposed as a followup/new JIRA.
    
    Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    yes @cloud-fan , you're 100% right, we want to treat `(...)` differently when it is in front of IN.
    
    Here you are the previous example in Postgres:
    ```
    mgaido=# select 1 from (select (1, 'a') as col1) tab1 where col1 in (select 1, 'a');
    ERROR:  subquery has too many columns
    LINE 1: ... 1 from (select (1, 'a') as col1) tab1 where col1 in (select...
    
    mgaido=# select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) in (select 1, 'a');
     ?column? 
    ----------
            1
    (1 row)
    ```
    
    In Oracle/MySQL you cannot create structs using `(...)` but you have to define a custom data type for structs, so this situation is prevented to happen.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/316/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r199087446
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    For this specific case, instead, I'll update this PR creating the new ad-hoc expression for the values in front of IN if you agree, as we have to deal not only with the subquery case. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93539/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1310/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94129/testReport)** for PR 21403 at commit [`45a91fc`](https://github.com/apache/spark/commit/45a91fc4b252967cf99c88331f51a702edadbaa2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r207805905
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -505,6 +505,7 @@ object NullPropagation extends Rule[LogicalPlan] {
     
           // If the value expression is NULL then transform the In expression to null literal.
           case In(Literal(null, _), _) => Literal.create(null, BooleanType)
    +      case InSubquery(Seq(Literal(null, _)), _) => Literal.create(null, BooleanType)
    --- End diff --
    
    Thanks for your comment. I checked it again and I am pretty sure no regression is introduced. We don't have many optimizer rules using In and all the others were and are applied only to In with a list of literals. I am adding this and the other tests. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1405/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r218398155
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out ---
    @@ -0,0 +1,70 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 7
    +
    +
    +-- !query 0
    +create temporary view tab_a as select * from values (1, 1) as tab_a(a1, b1)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +create temporary view tab_b as select * from values (1, 1) as tab_b(a2, b2)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +create temporary view struct_tab as select struct(col1 as a, col2 as b) as record from
    + values (1, 1), (1, 2), (2, 1), (2, 2)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +select 1 from tab_a where (a1, b1) not in (select a2, b2 from tab_b)
    +-- !query 3 schema
    +struct<1:int>
    +-- !query 3 output
    +
    +
    +
    +-- !query 4
    +select 1 from tab_a where (a1, b1) not in (select (a2, b2) from tab_b)
    --- End diff --
    
    This fails with a compile exception like the one reported in the JIRA


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94202/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93539/testReport)** for PR 21403 at commit [`0412829`](https://github.com/apache/spark/commit/04128292e6d145ec608166b532c960cac72a500c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93106/testReport)** for PR 21403 at commit [`a5771b8`](https://github.com/apache/spark/commit/a5771b8a0a4f00d95bb6f882f40ccccaa6dd17d0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24313][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90998/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93498/testReport)** for PR 21403 at commit [`60b57d2`](https://github.com/apache/spark/commit/60b57d2cde0ec1984a99c3db338df74b730623d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94178/testReport)** for PR 21403 at commit [`a6114a6`](https://github.com/apache/spark/commit/a6114a655305f318230bf1bbd25394e952793a94).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94001/testReport)** for PR 21403 at commit [`53e3d96`](https://github.com/apache/spark/commit/53e3d961a0cde6d6ab6b4c8b86b9134b9532f776).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #92084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92084/testReport)** for PR 21403 at commit [`df7d3ee`](https://github.com/apache/spark/commit/df7d3ee600a65a4b5e6c49f13a3f39b8196616c8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24313][WIP][SQL] Support IN subqueries with struc...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @mgaido91 can you link the correct JIRA? This one does not seem to be the correct one.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93508/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @maryannxue as I said my initial proposal was like that. I think that this has the advantage of avoiding some code duplication as the same logic which is added in ResolveInValues has to be spread over all the places where a In is build and avoiding to change the In signature, so that if a user is using In directly in his/her code we don't break it. On the other side, I agree with you that the approach having a `Seq[Expression]` is cleaner IMO (that's why it was my original proposal). @cloud-fan @gatorsmile what do you think about this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1485/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r205244503
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/subq-input-typecheck.sql.out ---
    @@ -113,15 +105,7 @@ WHERE
     struct<>
     -- !query 8 output
     org.apache.spark.sql.AnalysisException
    -cannot resolve '(named_struct('t1a', t1.`t1a`, 't1b', t1.`t1b`) IN (listquery(t1.`t1a`)))' due to data type mismatch: 
    -The number of columns in the left hand side of an IN subquery does not match the
    -number of columns in the output of subquery.
    -#columns in left hand side: 2.
    -#columns in right hand side: 1.
    -Left side columns:
    -[t1.`t1a`, t1.`t1b`].
    -Right side columns:
    -[t2.`t2a`].;
    --- End diff --
    
    The point here is that I added a new check to fail analysis earlier (in order to avoid wasting time doing useless analysis). By this comment, are you suggesting to remove the check or to kind of replicate the error message? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r205213736
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -247,6 +249,20 @@ class Analyzer(
         }
       }
     
    +  /**
    +   * Substitutes In values with an instance of [[InValues]].
    +   */
    +  object ResolveInValues extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    --- End diff --
    
    -> `plan.resolveOperators`
    
    Let us wait for https://github.com/apache/spark/pull/21822.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24313][WIP][SQL] Support IN subqueries with struc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #90998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90998/testReport)** for PR 21403 at commit [`5b6226f`](https://github.com/apache/spark/commit/5b6226fa48d82b461c6d5a8d1a9a625d2617af76).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94202/testReport)** for PR 21403 at commit [`a6114a6`](https://github.com/apache/spark/commit/a6114a655305f318230bf1bbd25394e952793a94).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class InSubquery(values: Seq[Expression], query: ListQuery)`
      * `case class ListQuery(`
      * `case class Exists(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1655/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92581/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93543/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94001/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93543/testReport)** for PR 21403 at commit [`bd008fe`](https://github.com/apache/spark/commit/bd008fe51f70f9925e9513680636f4dd9aadcd7c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92524/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94269/testReport)** for PR 21403 at commit [`eb1dfb7`](https://github.com/apache/spark/commit/eb1dfb7e0873b8479ea54d223b7fde3dcefa4834).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1777/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @juliuszsompolski yes, you're right, sorry, SPARK-24395 uses literal and not subqueries, sorry.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198842091
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    +1 on `InValues`. Maybe call it `InSubquery`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    any more comments on this @cloud-fan @gatorsmile @maryannxue ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1306/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94281/testReport)** for PR 21403 at commit [`eb1dfb7`](https://github.com/apache/spark/commit/eb1dfb7e0873b8479ea54d223b7fde3dcefa4834).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r207701622
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala ---
    @@ -154,7 +154,7 @@ class ExpressionParserSuite extends PlanTest {
       test("in sub-query") {
         assertEqual(
           "a in (select b from c)",
    -      In('a, Seq(ListQuery(table("c").select('b)))))
    +      InSubquery(Seq('a), ListQuery(table("c").select('b))))
    --- End diff --
    
    Could you add more cases in this test case? For example, when the input is CreateNamedStruct


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @juliuszsompolski I see your point and I can say it is an acceptable solution. Though it has some problems I think. If we follow this path, we are saying that: `(a, b) IN (select c, d from ...)` has a different result from `(a, b) IN (select (c, d) from ..)` and `(a, b) IN ((1, 2))`. We can probably argument that they are different things so they can lead to different results, but this is no very intuitive for a user.
    
    I'd prefer, in this case, having a rule about how we behave and follow that, throwing an AnalysisException otherwise. This is also the behavior of other RDBMS (I checked Oracle and Postgres):
    
     - `(a, b) IN (select c, d from ...)` unpacks them;
     - `(a, b) IN (select (c, d) from ..)` throws an `AnalysisException`
    
    So I would suggest going on with this approach, which could solve also other issues like SPARK-24395 since they would be considered as invalid.
    
    cc @hvanhovell what do you think?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93498/testReport)** for PR 21403 at commit [`60b57d2`](https://github.com/apache/spark/commit/60b57d2cde0ec1984a99c3db338df74b730623d0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Looks good to me, though I'm not very familiar with analyzer.
    @cloud-fan, @hvanhovell ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1274/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r206254993
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1422,11 +1422,26 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if value.resolved && !l.resolved =>
    +        case In(values, Seq(l @ ListQuery(_, _, exprId, _)))
    +            if values.forall(_.resolved) && !l.resolved =>
               val expr = resolveSubQuery(l, plans)((plan, exprs) => {
                 ListQuery(plan, exprs, exprId, plan.output)
               })
    -          In(value, Seq(expr))
    +          val subqueryOutput = expr.plan.output
    +          val resolvedIn = In(values, Seq(expr))
    +          if (values.length != subqueryOutput.length) {
    +            throw new AnalysisException(
    --- End diff --
    
    @mgaido91 We have this check in checkInputDataTypes and here ? Is there a way we can have the number of input check in one place ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93783/testReport)** for PR 21403 at commit [`423e93e`](https://github.com/apache/spark/commit/423e93efffa523cb44246218773c864cbb946059).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93599/testReport)** for PR 21403 at commit [`f5fa2c4`](https://github.com/apache/spark/commit/f5fa2c4b99a810c25a02e6d32550135d429c70c2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Had a related discussion with @marmbrus a few months ago. He also does not like reusing `IN` expression for subquery processing. I think it makes sense to introduce `InSubquery` expressions for subqueries. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @cloud-fan sure, I'll create a followup PR, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    @cloud-fan the problem is that the change is not only for the case when IN is followed by a listquery. The change is needed also in the other case. And the reason why this change is needed is to detect the difference between these 2 queries:
     1. `select 1 from (select (1, 'a') as col1) tab1 where col1 in (select 1, 'a')` or equivalently `select 1 from (select (1, 'a') as col1) tab1 where col1 in ((1, 'a'))`
     2. `select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) in (select 1, 'a')` or equivalently `select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) in ((1, 'a'))`
    
    In particular, queries 1 are invalid as they are comparing one value column with 2 column in the inner query/list of constants; while queries 2 are valid as they are comparing 2 columns on both sides. I hope this clarifies that introducing a specific `InListQuery` couldn't solve the problem.
    
    > It's not public so we can change it, but I believe some advanced users use these internal classes and we should keep these classes unchanged as possible as we can.
    
    I agree with you on this point, that is why I initially changed my proposal from `Seq[Expression]` to introducing the new `InValues`expression. Though also this might break existing user code as there is an extra expression they wouln't expect. So I think both solutions are equivalent. The only thing we cn do about this point is wait for 3.0 to have this in if we consider this a breaking change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94292/testReport)** for PR 21403 at commit [`eb1dfb7`](https://github.com/apache/spark/commit/eb1dfb7e0873b8479ea54d223b7fde3dcefa4834).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1272/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    sure, feel free to open a PR first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94269/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    ah i see. Can you add it to the migration guide? We need to tell users what will break after upgrading to 2.4 and why.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    I'd like to avoid changing the signature of an existing expression if possible. It's not public so we can change it, but I believe some advanced users use these internal classes and we should keep these classes unchanged as possible as we can.
    
    For this particular case, what we want is to special handle the case `(a, b) in (SELECT c,d FROM ...)`. Actually we have a dedicated parser rule for it
    ```
    | NOT? kind=IN '(' expression (',' expression)* ')' // normal IN expression
    | NOT? kind=IN '(' query ')'  // IN list query
    ```
    
    I think it makes more sense to create a dedicated expression for IN list query, but we should not make it temporary. We can create a base class for the normal IN expression and IN list query, but these 2 should be 2 different expressions. Then we don't need the `ResolveInValues` rule which is hacky.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r205214861
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2320,6 +2320,27 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec])
       }
     
    +  test("SPARK-24341: IN subqueries with struct fields") {
    --- End diff --
    
    Yes. Please move it there, if they are not duplicate. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198900926
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    ah it's getting interesting. I tried `=`
    ```
    cloud0fan=# select 1 from (select (1, 'a') as col1) tab1 where col1 = (1, 'a');
     ERROR:  could not identify an equality operator for type unknown
    cloud0fan=# select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) = (1, 'a');
     ?column? 
    ----------
            1
    (1 row)
    ```
    I'm wondering if there is any other special rules for `(...)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93497/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][WIP][SQL] Support IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r191383447
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ---
    @@ -45,6 +46,10 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
       private def getValueExpression(e: Expression): Seq[Expression] = {
         e match {
           case cns : CreateNamedStruct => cns.valExprs
    +      case Literal(struct: InternalRow, dt: StructType) if dt.isInstanceOf[StructType] =>
    +        dt.zipWithIndex.map { case (field, idx) => Literal(struct.get(idx, field.dataType)) }
    +      case a @ AttributeReference(_, dt: StructType, _, _) =>
    --- End diff --
    
    @hvanhovell I think  also SPARK-24395 somewhat relates to this. If we consider `(a, b) in (select (null, null))` as a comparison between structs, as you mentioned, we have to return the row when `a` and `b` are `null`. So, is the right approach to keep structs as they are and not unpacking them? The more I think about it, the more I think unpacking is the right option honestly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24313][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3478/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r199036630
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    The postgres result you posted for `=` is same as mine, isn't it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24313][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #92524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92524/testReport)** for PR 21403 at commit [`268307f`](https://github.com/apache/spark/commit/268307f52248d6408862cc76ccb54612ef9ef216).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94178/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93497/testReport)** for PR 21403 at commit [`c0ad1e3`](https://github.com/apache/spark/commit/c0ad1e38251df7882b3a27902b21dc3717d34697).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94292/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r218293687
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out ---
    @@ -0,0 +1,70 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 7
    +
    +
    +-- !query 0
    +create temporary view tab_a as select * from values (1, 1) as tab_a(a1, b1)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +create temporary view tab_b as select * from values (1, 1) as tab_b(a2, b2)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +create temporary view struct_tab as select struct(col1 as a, col2 as b) as record from
    + values (1, 1), (1, 2), (2, 1), (2, 2)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +select 1 from tab_a where (a1, b1) not in (select a2, b2 from tab_b)
    +-- !query 3 schema
    +struct<1:int>
    +-- !query 3 output
    +
    +
    +
    +-- !query 4
    +select 1 from tab_a where (a1, b1) not in (select (a2, b2) from tab_b)
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93999/testReport)** for PR 21403 at commit [`0f00a06`](https://github.com/apache/spark/commit/0f00a06a1853cb13d1d156bafcb85973c92e2b8e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r218293670
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out ---
    @@ -0,0 +1,70 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 7
    +
    +
    +-- !query 0
    +create temporary view tab_a as select * from values (1, 1) as tab_a(a1, b1)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +create temporary view tab_b as select * from values (1, 1) as tab_b(a2, b2)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +create temporary view struct_tab as select struct(col1 as a, col2 as b) as record from
    + values (1, 1), (1, 2), (2, 1), (2, 2)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +select 1 from tab_a where (a1, b1) not in (select a2, b2 from tab_b)
    --- End diff --
    
    what's the result of this query without this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    oh, I see @cloud-fan. But, IIUC, the other one is not used anymore. The only reference was removed by 4ce970d71488c7de6025ef925f75b8b92a5a6a79. I'll submit a PR to remove it if you agree.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94156/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93106/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93670/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94281/testReport)** for PR 21403 at commit [`eb1dfb7`](https://github.com/apache/spark/commit/eb1dfb7e0873b8479ea54d223b7fde3dcefa4834).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94178/testReport)** for PR 21403 at commit [`a6114a6`](https://github.com/apache/spark/commit/a6114a655305f318230bf1bbd25394e952793a94).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class InSubquery(values: Seq[Expression], query: ListQuery)`
      * `case class ListQuery(`
      * `case class Exists(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r218398060
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out ---
    @@ -0,0 +1,70 @@
    +-- Automatically generated by SQLQueryTestSuite
    +-- Number of queries: 7
    +
    +
    +-- !query 0
    +create temporary view tab_a as select * from values (1, 1) as tab_a(a1, b1)
    +-- !query 0 schema
    +struct<>
    +-- !query 0 output
    +
    +
    +
    +-- !query 1
    +create temporary view tab_b as select * from values (1, 1) as tab_b(a2, b2)
    +-- !query 1 schema
    +struct<>
    +-- !query 1 output
    +
    +
    +
    +-- !query 2
    +create temporary view struct_tab as select struct(col1 as a, col2 as b) as record from
    + values (1, 1), (1, 2), (2, 1), (2, 2)
    +-- !query 2 schema
    +struct<>
    +-- !query 2 output
    +
    +
    +
    +-- !query 3
    +select 1 from tab_a where (a1, b1) not in (select a2, b2 from tab_b)
    --- End diff --
    
    the same as after the patch, ie. an empty result set. It is included here in order to ensure that this is considered a valid query.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1010/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92084/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/617/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r199080792
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    I agree we should treat `(...)` specially if it's in front of `In`, but I'm wondering if we need to do the same thing for `=`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    kindly ping @cloud-fan @hvanhovell @juliuszsompolski 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198913668
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    Hive behaves as Postgres
    ```
    0: jdbc:hive2://aaa> select 1 from (select struct(1, 'a') as col1) tab1 where col1 in ((1, 'a'));
    Error: Error while compiling statement: FAILED: ParseException line 1:74 cannot recognize input near ')' '<EOF>' '<EOF>' in expression specification (state=42000,code=40000)
    0: jdbc:hive2://aaa> select 1 from (select struct(1, 'a') as col1) tab1 where (1, 'a') in ((1, 'a'));
    +------+--+
    | _c0  |
    +------+--+
    | 1    |
    +------+--+
    1 row selected (0.074 seconds)
    ```
    
    So I do believe this is the right direction.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #93496 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93496/testReport)** for PR 21403 at commit [`7c898a5`](https://github.com/apache/spark/commit/7c898a5d7fe188e8b617955e562a2aaf84fa7fdd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r206322990
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1422,11 +1422,26 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if value.resolved && !l.resolved =>
    +        case In(values, Seq(l @ ListQuery(_, _, exprId, _)))
    +            if values.forall(_.resolved) && !l.resolved =>
               val expr = resolveSubQuery(l, plans)((plan, exprs) => {
                 ListQuery(plan, exprs, exprId, plan.output)
               })
    -          In(value, Seq(expr))
    +          val subqueryOutput = expr.plan.output
    +          val resolvedIn = In(values, Seq(expr))
    +          if (values.length != subqueryOutput.length) {
    +            throw new AnalysisException(
    --- End diff --
    
    Overall it looks good to me. Just a few minor comments. cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198835484
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    I don't think so, as the `value` can be replaced later by other rules. So we do need to have a `Seq[Expression]` here, instead of a single expression. Another possible option which I haven't checked, but I think it may be feasible is to create a new kind of `Expression` (eg. `InValues`) we can use only for this specific case. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #94156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94156/testReport)** for PR 21403 at commit [`cb3467b`](https://github.com/apache/spark/commit/cb3467be92c1f7c8ed313ff1b37a00f82d59eda6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Exists(`
      * `case class InSubquery(values: Seq[Expression],`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r206322597
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1422,11 +1422,26 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if value.resolved && !l.resolved =>
    +        case In(values, Seq(l @ ListQuery(_, _, exprId, _)))
    +            if values.forall(_.resolved) && !l.resolved =>
               val expr = resolveSubQuery(l, plans)((plan, exprs) => {
                 ListQuery(plan, exprs, exprId, plan.output)
               })
    -          In(value, Seq(expr))
    +          val subqueryOutput = expr.plan.output
    +          val resolvedIn = In(values, Seq(expr))
    +          if (values.length != subqueryOutput.length) {
    +            throw new AnalysisException(
    --- End diff --
    
    @mgaido91 I tried the following -
    ```
    create table in(c1 int);
    insert into in values(1);
    spark-sql> select * from in where c1 in (1);
    1
    spark-sql> select * from in where c1 in (cast(1 as float));
    1
    spark-sql> select * from in where (c1,c1) in ((cast(1 as float), 1));
    Error in query: cannot resolve '(named_struct('c1', in.`c1`, 'c1', in.`c1`) IN (named_struct('col1', CAST(1 AS FLOAT), 'col2', 1)))' due to data type mismatch: Arguments must be same type but were: struct<c1:int,c1:int> != struct<col1:float,col2:int>; line 1 pos 31;
    ```
    Just a question, should we have been able to do a type promotion here ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21403#discussion_r198868288
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -161,33 +161,38 @@ case class Not(child: Expression)
            true
       """)
     // scalastyle:on line.size.limit
    -case class In(value: Expression, list: Seq[Expression]) extends Predicate {
    +case class In(values: Seq[Expression], list: Seq[Expression]) extends Predicate {
    --- End diff --
    
    Well, that would really work. We have the same problem also in the case we don't have a subquery. Eg. from Postgres:
    ```
    mgaido=# select 1 from (select (1, 'a') as col1) tab1 where col1 in ((1, 'a'));
    ERROR:  could not identify an equality operator for type unknown
    mgaido=# select 1 from (select (1, 'a') as col1) tab1 where (1, 'a') in ((1, 'a'));
     ?column? 
    ----------
            1
    (1 row)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94281/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24313][WIP][SQL] Support IN subqueries with struc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #90998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90998/testReport)** for PR 21403 at commit [`5b6226f`](https://github.com/apache/spark/commit/5b6226fa48d82b461c6d5a8d1a9a625d2617af76).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21403
  
    **[Test build #92581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92581/testReport)** for PR 21403 at commit [`d3e39ed`](https://github.com/apache/spark/commit/d3e39ed3f442958cfaaa1ef056cb72fedf0fce1c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org