You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/08/17 04:20:51 UTC

[GitHub] spark pull request #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates sho...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/18968

    [SPARK-21759][SQL] PullupCorrelatedPredicates should not produce unresolved plans

    ## What changes were proposed in this pull request?
    
    With the check for structural integrity proposed in SPARK-21726, I found that an optimization rule `PullupCorrelatedPredicates` can produce unresolved plans.
    
    For a correlated IN query like:
    
        Project [a#0]
        +- Filter a#0 IN (list#4 [b#1])
           :  +- Project [c#2]
           :     +- Filter (outer(b#1) < d#3)
           :        +- LocalRelation <empty>, [c#2, d#3]
           +- LocalRelation <empty>, [a#0, b#1]
    
    
    After `PullupCorrelatedPredicates`, it produces query plan like:
    
        'Project [a#0]
        +- 'Filter a#0 IN (list#4 [(b#1 < d#3)])
           :  +- Project [c#2, d#3]
           :     +- LocalRelation <empty>, [c#2, d#3]
           +- LocalRelation <empty>, [a#0, b#1]
    
    
    Because the correlated predicate involves another attribute `d#3` in subquery, it has been pulled out and added into the `Project` on the top of the subquery.
    
    When `list` in `In` contains just one `ListQuery`, `In.checkInputDataTypes` checks if the size of `value` expressions matches the output size of subquery. In the above example, there is only `value` expression and the subquery output has two attributes `c#2, d#3`, so it fails the check and `In.resolved` returns `false`.
    
    We should not let `PullupCorrelatedPredicates` produce unresolved plans to fail the structural integrity check.
    
    ## How was this patch tested?
    
    Added test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-21759

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18968.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18968
    
----
commit 4604a08e390019f7c3952774dd1b9086be9f2680
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-08-17T04:16:39Z

    PullupCorrelatedPredicates should not produce unresolved plans.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Thanks @viirya @cloud-fan. This looks much better. Can we not preserve the user facing error we raise today? I think the error we raise today is better for the user ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81027/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @cloud-fan Are they the same issue? I think you propose to simplify the data type checking? This PR goes to avoid reporting unresolved plans.
    
    In this query:
    
        SELECT t1.a FROM t1
        WHERE
        t1.a IN (SELECT t2.c
            FROM t2
            WHERE t1.b < t2.d);
    
    The left side of `In` is `t1.a`. After pulling correlated predicates, the right side is `(t2.c, t2.d)`. They don't match but we shouldn't fail the check and report it unresolved plan.
    
    Am I thinking it incorrectly? Or I misunderstand you propose?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134147067
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,63 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // TODO: Update this check if we combine the optimizer rules for subquery rewriting.
    +        //
    +        // In `CheckAnalysis`, we already check if the size of subquery plan output match the size
    +        // of value expressions. However, we can add extra correlated predicate references into
    +        // the top of subquery plan when pulling up correlated predicates. Thus, we add extra check
    +        // here to make sure we don't mess the query plan.
    +
    +        // Try to find out if any extra subquery output doesn't in the subquery condition.
    +        val extraOutputAllInCondition = sub.output.drop(valExprs.length).find { attr =>
    +          l.children.forall { c =>
    +            !c.references.contains(attr)
    +          }
    +        }.isEmpty
    +
    +        if (sub.output.length >= valExprs.length && extraOutputAllInCondition) {
    +          true
    +        } else {
    +          false
    +        }
    --- End diff --
    
    Looks good. Update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80767/testReport)** for PR 18968 at commit [`4a47393`](https://github.com/apache/spark/commit/4a47393e4605790c4cdf1a33639cd6595cd35ba8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80893/testReport)** for PR 18968 at commit [`4e9cea8`](https://github.com/apache/spark/commit/4e9cea8e170ec6eadfa43ffacc08f58e0a506798).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @viirya ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134171436
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,56 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    --- End diff --
    
    Early evaluation causes failure like this: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80893/testReport/org.apache.spark.sql.hive.execution/HiveCompatibilitySuite/subquery_in_having/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80892/testReport)** for PR 18968 at commit [`0903838`](https://github.com/apache/spark/commit/0903838b738f5575826b16be039c73b47d3c84ff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @viirya Hi Simon, many thanks for finding this. Instead of adding the compensation code in the resolve logic for in-subquery expression can we consider to move the semantic checking of comparing the count of arguments in either side of in-subquery expression to checkAnalysis instead ? We do several other checks in checkAnalysis for subquery expression. I just feel this may be a little cleaner ?
    
    For your reference, i quickly tried it [here](https://github.com/dilipbiswal/spark/tree/pr-18968-viirya)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #81027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81027/testReport)** for PR 18968 at commit [`66a193d`](https://github.com/apache/spark/commit/66a193dd9ef939c39279554c0ed45748fc72962b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    We do allow multiple columns in `IN subquery`. For example, 
    
    ```SQL
    SELECT t1a,
           t1b,
           t1d
    FROM   t1
    WHERE  ( t1b, t1d ) IN (SELECT t2b,
                                   t2d
                            FROM   t2
                            WHERE  t2i IN (SELECT t3i
                                           FROM   t3
                                           WHERE  t2b > t3b))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @dilipbiswal That's right. We don't really use the data type of `ListQuery`, because it is not an real expression actually. But maybe have an accurate one is good too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @gatorsmile @cloud-fan Any more comments on this change? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80925/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134120034
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,80 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    +        // For example, in the following query plan, the condition of `ListQuery` uses d#3.
    +        // from the subquery query. For now the size of output of subquery is 2(c#2, d#3), the
    +        // size of value is 1 (a#0).
    +        // Query:
    +        //   SELECT t1.a FROM t1
    +        //   WHERE
    +        //   t1.a IN (SELECT t2.c
    +        //           FROM t2
    +        //           WHERE t1.b < t2.d);
    +        // Query Plan:
    +        //   Project [a#0]
    +        //   +- Filter a#0 IN (list#4 [(b#1 < d#3)])
    +        //      :  +- Project [c#2, d#3]
    +        //      :     +- LocalRelation <empty>, [c#2, d#3]
    +        //      +- LocalRelation <empty>, [a#0, b#1]
    +        //
    +        // Notice that in analysis we should not face such problem. During analysis we only care
    +        // if the size of subquery plan match the size of value expression. `CheckAnalysis` makes
    +        // sure this by a particular check. However, optimization rules will possibly change the
    +        // analyzed plan and produce unresolved plan again. That's why we add this check here.
    +
    +        // Take the subset of output which are not going to match with value expressions and also
    +        // not used in condition expressions, if any.
    +        val subqueryOutputNotInCondition = sub.output.drop(valExprs.length).filter { attr =>
    +          l.children.forall { c =>
    +            !c.references.contains(attr)
    +          }
    +        }
    +
    +        if (sub.output.length < valExprs.length || subqueryOutputNotInCondition.nonEmpty) {
    +          false
    +        } else {
    +          true
    +        }
    --- End diff --
    
    ```Scala
    if (sub.output.length < valExprs.length || subqueryOutputNotInCondition.nonEmpty) {
      false
    } else {
      true
    }
    ```
    This can be simplified to
    ```Scala
    sub.output.length >= valExprs.length && subqueryOutputNotInCondition.isEmpty
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @gatorsmile @cloud-fan @viirya I was thinking about cloud-fan's question. Actually we may not be representing the data type of the listquery expression correctly.  Would this represent this more accurately ? 
    ```sql
    override def dataType: DataType =
        if (plan.output.length > 1) plan.schema else plan.schema.fields.head.dataType
    ```
    Actually we don't use the data type to know the return type for listquery instead we use the plan's output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @gatorsmile @viirya There was pr from Natt [pr](https://github.com/apache/spark/pull/17520). Is it possible to get some feedback on the idea ? If we do this, the next step was to combine the pullup and rewrite to one single rule. Actually i had this change made in my local branch a while back. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80930/testReport)** for PR 18968 at commit [`a96fb8d`](https://github.com/apache/spark/commit/a96fb8d056bde4ed8a5f02a2767d09a3c661051d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80930/testReport)** for PR 18968 at commit [`a96fb8d`](https://github.com/apache/spark/commit/a96fb8d056bde4ed8a5f02a2767d09a3c661051d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    SGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Thanks Simon. Changes look good to me. cc @cloud-fan @gatorsmile for any additional comments. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80920/testReport)** for PR 18968 at commit [`0e15cde`](https://github.com/apache/spark/commit/0e15cdeb5897b767de91bcfb38e7d15034bbd994).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    I think the refactor I proposed is a smaller patch than this PR...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80891/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80893/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @dilipbiswal @gatorsmile Regarding with the error message, I do think so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates sho...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r133767375
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -140,25 +140,55 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
       require(list != null, "list should not be null")
       override def checkInputDataTypes(): TypeCheckResult = {
         list match {
    -      case ListQuery(sub, _, _) :: Nil =>
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
             val valExprs = value match {
               case cns: CreateNamedStruct => cns.valExprs
               case expr => Seq(expr)
             }
    -        if (valExprs.length != sub.output.length) {
    -          TypeCheckResult.TypeCheckFailure(
    +
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    +        // For example, in the following query plan, the condition of `ListQuery` uses value#207
    +        // from the subquery query. For now the size of output of subquery is 2, the size of value
    +        // is 1.
    +        //
    +        //   Filter key#201 IN (list#200 [(value#207 = min(value)#204)])
    +        //   :  +- Project [key#206, value#207]
    +        //   :     +- Filter (value#207 > val_9)
    --- End diff --
    
    why you pick a different example in PR description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    We could. resolveSubquery will do at least one time analysis to resolve the
    subquery. It might be a waste for already resolved subquery.
    
    Let me try if I can remove the case with little changes.
    
    
    On Aug 24, 2017 2:10 PM, "Dilip Biswal" <no...@github.com> wrote:
    
    *@dilipbiswal* commented on this pull request.
    ------------------------------
    
    In sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/an
    alysis/Analyzer.scala
    <https://github.com/apache/spark/pull/18968#discussion_r134933318>:
    
    > @@ -1286,8 +1286,16 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId))) if
    value.resolved && !sub.resolved =>
    -          val expr = resolveSubQuery(l, plans)(ListQuery(_, _, exprId))
    +        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if
    value.resolved && !sub.resolved =>
    
    @viirya <https://github.com/viirya> If we modified to
    
    case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if
    value.resolved && !l.resolved
    
    would we still require the following case statement ? The following case
    looks a little
    strange as we are in the resolveSubqueries routine and check for
    sub.resolved == true.
    
    —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub
    <https://github.com/apache/spark/pull/18968#pullrequestreview-58283445>,
    or mute
    the thread
    <https://github.com/notifications/unsubscribe-auth/AAEM9xQlIzMPgV5_BoB_PK8tJCXkzilfks5sbRO8gaJpZM4O5wyz>
    .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80988/testReport)** for PR 18968 at commit [`a7f6816`](https://github.com/apache/spark/commit/a7f681698ce6cbea6e0289747b893d2b3cf60415).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80892/testReport)** for PR 18968 at commit [`0903838`](https://github.com/apache/spark/commit/0903838b738f5575826b16be039c73b47d3c84ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18968


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80765/testReport)** for PR 18968 at commit [`4604a08`](https://github.com/apache/spark/commit/4604a08e390019f7c3952774dd1b9086be9f2680).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    After we correctly define the data type of `ListQuery`, can we remove the special handling of `ListQuery` in `In.checkInputDataTypes`?
    
    we can add `ListQuery.childOutputs: Seq[Attribute]`, so that even we extend the project list of `ListQuery.plan`, we still keep the corrected data type:
    ```
    def dataType = if (childOutputs.length > 1) childOutputs.toStructType else childOutputs.head.dataType
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80769/testReport)** for PR 18968 at commit [`f5d8ebb`](https://github.com/apache/spark/commit/f5d8ebb20ef73c115118b062c57f0f1372f672a2).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80930/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134147697
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,56 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    --- End diff --
    
    why `lazy` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80915/testReport)** for PR 18968 at commit [`dd48a9d`](https://github.com/apache/spark/commit/dd48a9d9476c7ba775df2a1764b6311d778891b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80891/testReport)** for PR 18968 at commit [`99d9570`](https://github.com/apache/spark/commit/99d9570a04b583cd1dc5180d3760c6ffcb1e8513).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80895/testReport)** for PR 18968 at commit [`52f52bb`](https://github.com/apache/spark/commit/52f52bb2609bc019d3aa4d609118810eccb9caa2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    That's right, if we combine the two rules, we won't produce the unresolved `In` predicate. I just don't know if we have the plan in near term to combine them together. Sounds like a non-trivial change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @dilipbiswal Thanks for comment.
    
    This issue is happened at optimization phase, so I think it is supposed that we already pass `checkAnalysis`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80895/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80892/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    cc @cloud-fan @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80812/testReport)** for PR 18968 at commit [`476b4ab`](https://github.com/apache/spark/commit/476b4abe43495d626da2d23bdbeb6a5e37f8e981).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80889/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @cloud-fan Based on my understanding, I revise this change. May you look if it is what you think? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    > We should not let PullupCorrelatedPredicates produce unresolved plans to fail the structural integrity check.
    
    This reads misleading, actually this PR does not change `PullupCorrelatedPredicates`, but to fix the type checking to not mistakenly report unresolved plans.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134944975
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1286,8 +1286,16 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId))) if value.resolved && !sub.resolved =>
    -          val expr = resolveSubQuery(l, plans)(ListQuery(_, _, exprId))
    +        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if value.resolved && !sub.resolved =>
    --- End diff --
    
    I thought to change `resolveSubQuery` to avoid re-analysis on a resolved plan. But since it is just once, maybe not a big deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    doesn't the default type check logic work?
    ```
             val mismatchOpt = list.find(l => l.dataType != value.dataType)
             if (mismatchOpt.isDefined) {
               TypeCheckResult.TypeCheckFailure(s"Arguments must be same type but were: " +
                 s"${value.dataType} != ${mismatchOpt.get.dataType}")
             } else {
               TypeUtils.checkForOrderingExpr(value.dataType, s"function $prettyName")
             }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @cloud-fan Thanks for the comments. I've updated the PR description and added a SQL statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80920/testReport)** for PR 18968 at commit [`0e15cde`](https://github.com/apache/spark/commit/0e15cdeb5897b767de91bcfb38e7d15034bbd994).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    I'd moved the original check to `CheckAnalysis`. The valid IN subquery should have equal size value expressions and subquery output. If not, we should throw user-facing error.
    
    To prevent possible issue brought by optimization rules, the additional check is put in `resolved`. This is not user-facing error. I think we can rely the newly proposed structural integrity check to detect this.
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @viirya Thank you !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80893/testReport)** for PR 18968 at commit [`4e9cea8`](https://github.com/apache/spark/commit/4e9cea8e170ec6eadfa43ffacc08f58e0a506798).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134920136
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala ---
    @@ -274,17 +274,24 @@ object ScalarSubquery {
     case class ListQuery(
         plan: LogicalPlan,
         children: Seq[Expression] = Seq.empty,
    -    exprId: ExprId = NamedExpression.newExprId)
    +    exprId: ExprId = NamedExpression.newExprId,
    +    childOutputs: Seq[Attribute] = Seq.empty)
       extends SubqueryExpression(plan, children, exprId) with Unevaluable {
    -  override def dataType: DataType = plan.schema.fields.head.dataType
    +  override def dataType: DataType = if (childOutputs.length > 1) {
    +    childOutputs.toStructType
    +  } else {
    +    childOutputs.head.dataType
    +  }
    +  override lazy val resolved: Boolean = childrenResolved && plan.resolved && childOutputs.nonEmpty
    --- End diff --
    
    Before we fill in `childOutputs`, this `ListQuery` cannot be resolved. Otherwise, to access its `dataType` causes failure in `In.checkInputDataTypes`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80925/testReport)** for PR 18968 at commit [`0e15cde`](https://github.com/apache/spark/commit/0e15cdeb5897b767de91bcfb38e7d15034bbd994).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates sho...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r133851955
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -140,25 +140,55 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
       require(list != null, "list should not be null")
       override def checkInputDataTypes(): TypeCheckResult = {
         list match {
    -      case ListQuery(sub, _, _) :: Nil =>
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
             val valExprs = value match {
               case cns: CreateNamedStruct => cns.valExprs
               case expr => Seq(expr)
             }
    -        if (valExprs.length != sub.output.length) {
    -          TypeCheckResult.TypeCheckFailure(
    +
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    +        // For example, in the following query plan, the condition of `ListQuery` uses value#207
    +        // from the subquery query. For now the size of output of subquery is 2, the size of value
    +        // is 1.
    +        //
    +        //   Filter key#201 IN (list#200 [(value#207 = min(value)#204)])
    +        //   :  +- Project [key#206, value#207]
    +        //   :     +- Filter (value#207 > val_9)
    --- End diff --
    
    The example in PR description is constructed later. This example is I encountered in `subquery_in_having.q`. I'll make it consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80920/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @dilipbiswal But we still need to detect such violation of check and report the error if optimization rules cause the side effect that makes the plan unresolved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Even we have correct data type, don't we still need to verify it matches the data type of `value` expressions in `In.checkInputDataTypes`? There is no guarantee that they will match, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    I agree that the original check should be in `checkAnalysis` instead of `checkInputDataTypes`.
    
    The additional check added by this change can be put in `resolved`. Sounds good to me.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @cloud-fan OK. I think I got your point. Your proposal is to have `ListQuery.childOutputs` which keeps the original output of subquery. Even we extend the project list of subquery, `childOutputs` is unchanged.
    
    If that is correct, my only concern is if we extend the project list by adding attributes which are not in condition, we aren't able to find it out.
    
    This is basically a safe guard for the possibility we mess the query plan in optimization, if you think it is not necessary, I think your proposal is a good choice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Oh, yes, I thought you said we don't need to check it. Right we can do it. I think we better have another PR to do it. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80767/testReport)** for PR 18968 at commit [`4a47393`](https://github.com/apache/spark/commit/4a47393e4605790c4cdf1a33639cd6595cd35ba8).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80989/testReport)** for PR 18968 at commit [`498bd3b`](https://github.com/apache/spark/commit/498bd3b5429d22867a57c0d7b34fc0a20214aa87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80767/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134146523
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,63 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // TODO: Update this check if we combine the optimizer rules for subquery rewriting.
    +        //
    +        // In `CheckAnalysis`, we already check if the size of subquery plan output match the size
    +        // of value expressions. However, we can add extra correlated predicate references into
    +        // the top of subquery plan when pulling up correlated predicates. Thus, we add extra check
    +        // here to make sure we don't mess the query plan.
    +
    +        // Try to find out if any extra subquery output doesn't in the subquery condition.
    +        val extraOutputAllInCondition = sub.output.drop(valExprs.length).find { attr =>
    +          l.children.forall { c =>
    +            !c.references.contains(attr)
    +          }
    +        }.isEmpty
    +
    +        if (sub.output.length >= valExprs.length && extraOutputAllInCondition) {
    +          true
    +        } else {
    +          false
    +        }
    --- End diff --
    
    Line 159 - Line 169 can be simplified to
    
    ```Scala
            val isAllExtraOutputInCondition = sub.output.drop(valExprs.length).forall { attr =>
              children.exists(_.references.contains(attr))
            }
            sub.output.length >= valExprs.length && isAllExtraOutputInCondition
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    We can write IN subquery like:
    
        SELECT * FROM t1
        WHERE struct(t1.a, t1.b) IN (SELECT t2.d, t2.e
                                     FROM t2
                                     WHERE t1.c < t2.f)")
    
    It is valid because the length of value expressions (t1.a, t1.b) matches the length of subquery output (t2.d, t2.e).
    
    So seems `ListQuery.plan` can produce more than one column?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80778/testReport)** for PR 18968 at commit [`38631bb`](https://github.com/apache/spark/commit/38631bb62d3c698d707643c5ed700fb4cfd9220f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #81027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81027/testReport)** for PR 18968 at commit [`66a193d`](https://github.com/apache/spark/commit/66a193dd9ef939c39279554c0ed45748fc72962b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81071/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    [Moving such checks from `In.checkInputDataTypes ` to `checkAnalysis`](https://github.com/dilipbiswal/spark/commit/185159e82572fd8c41f482bae54f791d2bf1b56a) looks cleaner to me. What we are doing in `In.checkInputDataTypes` does not belong to `checkInputDataTypes`.
    
    If we want to capture the potential analysis errors introduced by the optimizer rule `PullupCorrelatedPredicates `, we should override `resolved` instead of doing all these checks in `checkInputDataTypes`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Maybe we can still have a case for ListQuery, but it is simpler and mainly for better message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80778/testReport)** for PR 18968 at commit [`38631bb`](https://github.com/apache/spark/commit/38631bb62d3c698d707643c5ed700fb4cfd9220f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80778/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80889/testReport)** for PR 18968 at commit [`998c70b`](https://github.com/apache/spark/commit/998c70bddd605fbd44a89db5f089a03014b37066).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Re-thinking it, I agree that this kind of check should put in `resolved`. However, I doubt whether we should put in `checkAnalysis`. By doing this, we split the resolving check of `In` to two places, one in `checkAnalysis`, one in `In.resolved`.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #81028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81028/testReport)** for PR 18968 at commit [`dae01f1`](https://github.com/apache/spark/commit/dae01f1aff2603a7e8d8bbefdd4db4db470de6f3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80989/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80915/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80812/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #81028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81028/testReport)** for PR 18968 at commit [`dae01f1`](https://github.com/apache/spark/commit/dae01f1aff2603a7e8d8bbefdd4db4db470de6f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80765/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134126635
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,80 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    --- End diff --
    
    So we are adding another criteria to consider an in-subquery expression to be resolved. That new criteria is - 
    1) Any additional output attributes that may have been added to the subquery plan by optimizer should have a reference in the originating in-subquery expression's children.(children reflect the pulled up correlated predicates)
    
    Just for my understanding, there is no way to trigger this condition from our regular code path, right ? This is just to guard against any potential incorrect rewrites by the optimizer in the future ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Btw, what the purpose to have `ListQuery.childOuput: Attribute`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    One thing I wanna discuss, is it better to have a `ListQuery.childOuput: Attribute`? Looking at `ListQuery.dataType`, seems we assume `ListQuery.plan` only produce one column, but we didn't enforce it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134136888
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,80 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    --- End diff --
    
    @dilipbiswal Yeah, you are right. Normally we don't trigger it. It is just here in case we possibly mess the query plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80988/testReport)** for PR 18968 at commit [`a7f6816`](https://github.com/apache/spark/commit/a7f681698ce6cbea6e0289747b893d2b3cf60415).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80769 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80769/testReport)** for PR 18968 at commit [`f5d8ebb`](https://github.com/apache/spark/commit/f5d8ebb20ef73c115118b062c57f0f1372f672a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80989/testReport)** for PR 18968 at commit [`498bd3b`](https://github.com/apache/spark/commit/498bd3b5429d22867a57c0d7b34fc0a20214aa87).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    ah, if that query is valid, seems `ListQuery.dataType` is wrong?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80895/testReport)** for PR 18968 at commit [`52f52bb`](https://github.com/apache/spark/commit/52f52bb2609bc019d3aa4d609118810eccb9caa2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80889/testReport)** for PR 18968 at commit [`998c70b`](https://github.com/apache/spark/commit/998c70bddd605fbd44a89db5f089a03014b37066).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80925/testReport)** for PR 18968 at commit [`0e15cde`](https://github.com/apache/spark/commit/0e15cdeb5897b767de91bcfb38e7d15034bbd994).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81028/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #81071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81071/testReport)** for PR 18968 at commit [`9364d6e`](https://github.com/apache/spark/commit/9364d6e04b1f99d6b70e851b4405d790a26f58be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80769/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    can you also include the SQL statement? I feel a little hard to read the query plan with list query as I'm not faimilar with this part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134231496
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,56 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // TODO: Update this check if we combine the optimizer rules for subquery rewriting.
    +        //
    +        // In `CheckAnalysis`, we already check if the size of subquery plan output match the size
    +        // of value expressions. However, we can add extra correlated predicate references into
    +        // the top of subquery plan when pulling up correlated predicates. Thus, we add extra check
    +        // here to make sure we don't mess the query plan.
    +
    +        // Try to find out if any extra subquery output doesn't in the subquery condition.
    +        val isAllExtraOutputInCondition = sub.output.drop(valExprs.length).forall { attr =>
    +          children.exists(_.references.contains(attr))
    +        }
    +        sub.output.length >= valExprs.length && isAllExtraOutputInCondition
    +      case _ => true
    +    }
    +    // Scala doesn't allow us refer super.resolved.
    +    childrenResolved && checkInputDataTypes().isSuccess && checkForInSubquery
    +  }
    +
       override def checkInputDataTypes(): TypeCheckResult = {
         list match {
    -      case ListQuery(sub, _, _) :: Nil =>
    -        val valExprs = value match {
    -          case cns: CreateNamedStruct => cns.valExprs
    -          case expr => Seq(expr)
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        val mismatchedColumns = valExprs.zip(sub.output.take(valExprs.length)).flatMap {
    --- End diff --
    
    nit: no need to call `take`, `zip` will take care of it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134120146
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,80 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    +        // For example, in the following query plan, the condition of `ListQuery` uses d#3.
    +        // from the subquery query. For now the size of output of subquery is 2(c#2, d#3), the
    +        // size of value is 1 (a#0).
    +        // Query:
    +        //   SELECT t1.a FROM t1
    +        //   WHERE
    +        //   t1.a IN (SELECT t2.c
    +        //           FROM t2
    +        //           WHERE t1.b < t2.d);
    +        // Query Plan:
    +        //   Project [a#0]
    +        //   +- Filter a#0 IN (list#4 [(b#1 < d#3)])
    +        //      :  +- Project [c#2, d#3]
    +        //      :     +- LocalRelation <empty>, [c#2, d#3]
    +        //      +- LocalRelation <empty>, [a#0, b#1]
    +        //
    +        // Notice that in analysis we should not face such problem. During analysis we only care
    +        // if the size of subquery plan match the size of value expression. `CheckAnalysis` makes
    +        // sure this by a particular check. However, optimization rules will possibly change the
    +        // analyzed plan and produce unresolved plan again. That's why we add this check here.
    +
    +        // Take the subset of output which are not going to match with value expressions and also
    +        // not used in condition expressions, if any.
    +        val subqueryOutputNotInCondition = sub.output.drop(valExprs.length).filter { attr =>
    +          l.children.forall { c =>
    +            !c.references.contains(attr)
    +          }
    +        }
    +
    +        if (sub.output.length < valExprs.length || subqueryOutputNotInCondition.nonEmpty) {
    +          false
    +        } else {
    +          true
    +        }
    --- End diff --
    
    This should be further simplified. We do not need to find out the exact subquery outputs that are not in the subquery condition. We just need to check whether they are empty.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #81071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81071/testReport)** for PR 18968 at commit [`9364d6e`](https://github.com/apache/spark/commit/9364d6e04b1f99d6b70e851b4405d790a26f58be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @viirya Isn't checkAnalysis supposed to catch such semantic errors ? In my thinking, this particular error to make sure the left hand side number of args matching the right hand side is to catch **user errors in the input query**. After that is done either during analyzer or post analysis phase such as checkAnalysis , we shouldn't be doing this particular check. My reason is that , if optimizer causes a side effect such that it makes the original check invalid, we shouldn't be returning the particular error that we return today as that wouldn't mean much to the user as thats not the query he typed in , correct ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134596955
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/subq-input-typecheck.sql.out ---
    @@ -72,16 +72,7 @@ t1a IN (SELECT t2a, t2b
     struct<>
     -- !query 5 output
     org.apache.spark.sql.AnalysisException
    -cannot resolve '(t1.`t1a` IN (listquery(t1.`t1a`)))' due to data type mismatch: 
    -The number of columns in the left hand side of an IN subquery does not match the
    -number of columns in the output of subquery.
    -#columns in left hand side: 1.
    -#columns in right hand side: 2.
    -Left side columns:
    -[t1.`t1a`].
    -Right side columns:
    -[t2.`t2a`, t2.`t2b`].
    -             ;
    +cannot resolve '(t1.`t1a` IN (listquery(t1.`t1a`)))' due to data type mismatch: Arguments must be same type but were: IntegerType != StructType(StructField(t2a,IntegerType,false), StructField(t2b,IntegerType,false));
    --- End diff --
    
    This new message is confusing when users using the In Subquery. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80988/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80812/testReport)** for PR 18968 at commit [`476b4ab`](https://github.com/apache/spark/commit/476b4abe43495d626da2d23bdbeb6a5e37f8e981).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] PullupCorrelatedPredicates should not...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80765/testReport)** for PR 18968 at commit [`4604a08`](https://github.com/apache/spark/commit/4604a08e390019f7c3952774dd1b9086be9f2680).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134119888
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,80 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    +        // For example, in the following query plan, the condition of `ListQuery` uses d#3.
    +        // from the subquery query. For now the size of output of subquery is 2(c#2, d#3), the
    +        // size of value is 1 (a#0).
    +        // Query:
    +        //   SELECT t1.a FROM t1
    +        //   WHERE
    +        //   t1.a IN (SELECT t2.c
    +        //           FROM t2
    +        //           WHERE t1.b < t2.d);
    +        // Query Plan:
    +        //   Project [a#0]
    +        //   +- Filter a#0 IN (list#4 [(b#1 < d#3)])
    +        //      :  +- Project [c#2, d#3]
    +        //      :     +- LocalRelation <empty>, [c#2, d#3]
    +        //      +- LocalRelation <empty>, [a#0, b#1]
    +        //
    +        // Notice that in analysis we should not face such problem. During analysis we only care
    +        // if the size of subquery plan match the size of value expression. `CheckAnalysis` makes
    +        // sure this by a particular check. However, optimization rules will possibly change the
    +        // analyzed plan and produce unresolved plan again. That's why we add this check here.
    +
    +        // Take the subset of output which are not going to match with value expressions and also
    +        // not used in condition expressions, if any.
    +        val subqueryOutputNotInCondition = sub.output.drop(valExprs.length).filter { attr =>
    --- End diff --
    
    Also, add a TODO here. This check needs an update after we combine the optimizer rules for subquery rewriting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134933318
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1286,8 +1286,16 @@ class Analyzer(
               resolveSubQuery(s, plans)(ScalarSubquery(_, _, exprId))
             case e @ Exists(sub, _, exprId) if !sub.resolved =>
               resolveSubQuery(e, plans)(Exists(_, _, exprId))
    -        case In(value, Seq(l @ ListQuery(sub, _, exprId))) if value.resolved && !sub.resolved =>
    -          val expr = resolveSubQuery(l, plans)(ListQuery(_, _, exprId))
    +        case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if value.resolved && !sub.resolved =>
    --- End diff --
    
    @viirya If we modified to 
    ```
    case In(value, Seq(l @ ListQuery(sub, _, exprId, _))) if value.resolved && !l.resolved
    ```
    would we still require the following case statement ? The following case looks a little
    strange as we are in the resolveSubqueries routine and check for sub.resolved == true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80891/testReport)** for PR 18968 at commit [`99d9570`](https://github.com/apache/spark/commit/99d9570a04b583cd1dc5180d3760c6ffcb1e8513).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    Thanks @cloud-fan @gatorsmile @dilipbiswal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18968#discussion_r134119992
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
    @@ -138,46 +138,80 @@ case class Not(child: Expression)
     case class In(value: Expression, list: Seq[Expression]) extends Predicate {
     
       require(list != null, "list should not be null")
    +
    +  lazy val valExprs = value match {
    +    case cns: CreateNamedStruct => cns.valExprs
    +    case expr => Seq(expr)
    +  }
    +
    +  override lazy val resolved: Boolean = {
    +    lazy val checkForInSubquery = list match {
    +      case (l @ ListQuery(sub, children, _)) :: Nil =>
    +        // SPARK-21759:
    +        // It is possibly that the subquery plan has more output than value expressions, because
    +        // the condition expressions in `ListQuery` might use part of subquery plan's output.
    --- End diff --
    
    This needs a better description. Instead of documenting the fact, this is caused by adding missing correlated predicate references when we pull up the correlated predicates. cc @dilipbiswal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    I'm not very sure how we use `ListQuery.dataType`, or do we actually use it since it might not be very meaningful to have a datatype for a IN subquery.
    
    In order to make sure type matching, we do type checking in `In` between the output of subquery wrapped in `ListQuery` and the `value` expressions. We won't use `ListQuery` with other expression, I think.
    
    Maybe it has `dataType` just because it needs to have one, as it inherits `Expression`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    **[Test build #80915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80915/testReport)** for PR 18968 at commit [`dd48a9d`](https://github.com/apache/spark/commit/dd48a9d9476c7ba775df2a1764b6311d778891b9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/18968
  
    @viirya I agree that we should report violations. The only question i had is whether we should tie this particular check to the expression being resolved or not. In the old version of the code, we used to do this particular check in the analyzer. However after doing the check, we used to change the expression to PredicateSubquery after pulling up the correlated predicates.  I am fine with whatever you decide. If we keep the check, we should make it such that it reads a little better  than how it reads now :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org