You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/04/21 00:03:43 UTC

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/12539

    [SPARK-14763] [SQL] fix subquery resolution

    ## What changes were proposed in this pull request?
    
    Currently, a column could be resolved wrongly if there are columns from both outer table and subquery have the same name, we should only resolve the attributes that can't be resolved within subquery. They may have same exprId than other attributes in subquery, so we should create alias for them.
    
    Also, the column in IN subquery could have same exprId, we should create alias for them.
    
    ## How was this patch tested?
    
    Added regression tests. Manually tests TPCDS Q70, work well after this patch.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark fix_subquery

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12539
    
----
commit af6cb56d59ca9499ab374be63c95730bf0758d6a
Author: Davies Liu <da...@databricks.com>
Date:   2016-04-20T20:49:31Z

    fix subquery resolution

commit b5b4c74446d6c6aadcd5c95a7ed588edc72d0957
Author: Davies Liu <da...@databricks.com>
Date:   2016-04-20T22:00:50Z

    fix column with same exprId

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-213015476
  
    **[Test build #56556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56556/consoleFull)** for PR 12539 at commit [`fd8c75c`](https://github.com/apache/spark/commit/fd8c75cfbc847c29ba313fce5f43903396714356).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60696584
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -862,28 +862,68 @@ class Analyzer(
       object ResolveSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
         /**
    -     * Resolve the correlated predicates in the [[Filter]] clauses (e.g. WHERE or HAVING) of a
    +     * Resolve the correlated predicates in the clauses (e.g. WHERE or HAVING) of a
          * sub-query by using the plan the predicates should be correlated to.
          */
    -    private def resolveCorrelatedPredicates(q: LogicalPlan, p: LogicalPlan): LogicalPlan = {
    -      q transformUp {
    -        case f @ Filter(cond, child) if child.resolved && !f.resolved =>
    -          val newCond = resolveExpression(cond, p, throws = false)
    -          if (!cond.fastEquals(newCond)) {
    -            Filter(newCond, child)
    -          } else {
    -            f
    -          }
    +    private def resolveCorrelatedSubquery(
    +        sub: LogicalPlan, outer: LogicalPlan,
    +        aliases: scala.collection.mutable.Map[Attribute, Alias]): LogicalPlan = {
    +      // First resolve as much of the sub-query as possible
    +      val analyzed = execute(sub)
    +      if (analyzed.resolved) {
    +        analyzed
    +      } else {
    +        // Only resolve the lowest plan that is not resolved by outer plan, otherwise it could be
    +        // resolved by itself
    +        val resolvedByOuter = analyzed transformDown {
    --- End diff --
    
    We should resolve all the operators in subquery if possible. Once some attributes from outer table are resolved, the parent of it should be resolved in subquery, not outer table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-212629869
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56418/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-213066258
  
    cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60698757
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -862,28 +862,68 @@ class Analyzer(
       object ResolveSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
         /**
    -     * Resolve the correlated predicates in the [[Filter]] clauses (e.g. WHERE or HAVING) of a
    +     * Resolve the correlated predicates in the clauses (e.g. WHERE or HAVING) of a
          * sub-query by using the plan the predicates should be correlated to.
          */
    -    private def resolveCorrelatedPredicates(q: LogicalPlan, p: LogicalPlan): LogicalPlan = {
    -      q transformUp {
    -        case f @ Filter(cond, child) if child.resolved && !f.resolved =>
    -          val newCond = resolveExpression(cond, p, throws = false)
    -          if (!cond.fastEquals(newCond)) {
    -            Filter(newCond, child)
    -          } else {
    -            f
    -          }
    +    private def resolveCorrelatedSubquery(
    +        sub: LogicalPlan, outer: LogicalPlan,
    +        aliases: scala.collection.mutable.Map[Attribute, Alias]): LogicalPlan = {
    +      // First resolve as much of the sub-query as possible
    +      val analyzed = execute(sub)
    +      if (analyzed.resolved) {
    +        analyzed
    +      } else {
    +        // Only resolve the lowest plan that is not resolved by outer plan, otherwise it could be
    +        // resolved by itself
    +        val resolvedByOuter = analyzed transformDown {
    --- End diff --
    
    Yeah, that is fair.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60697951
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -862,28 +862,68 @@ class Analyzer(
       object ResolveSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
         /**
    -     * Resolve the correlated predicates in the [[Filter]] clauses (e.g. WHERE or HAVING) of a
    +     * Resolve the correlated predicates in the clauses (e.g. WHERE or HAVING) of a
          * sub-query by using the plan the predicates should be correlated to.
          */
    -    private def resolveCorrelatedPredicates(q: LogicalPlan, p: LogicalPlan): LogicalPlan = {
    -      q transformUp {
    -        case f @ Filter(cond, child) if child.resolved && !f.resolved =>
    -          val newCond = resolveExpression(cond, p, throws = false)
    -          if (!cond.fastEquals(newCond)) {
    -            Filter(newCond, child)
    -          } else {
    -            f
    -          }
    +    private def resolveCorrelatedSubquery(
    +        sub: LogicalPlan, outer: LogicalPlan,
    +        aliases: scala.collection.mutable.Map[Attribute, Alias]): LogicalPlan = {
    +      // First resolve as much of the sub-query as possible
    +      val analyzed = execute(sub)
    +      if (analyzed.resolved) {
    +        analyzed
    +      } else {
    +        // Only resolve the lowest plan that is not resolved by outer plan, otherwise it could be
    +        // resolved by itself
    +        val resolvedByOuter = analyzed transformDown {
    --- End diff --
    
    The call-times is only the number of operators that has references to outer table, it should not be high I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60697711
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -862,28 +862,68 @@ class Analyzer(
       object ResolveSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
         /**
    -     * Resolve the correlated predicates in the [[Filter]] clauses (e.g. WHERE or HAVING) of a
    +     * Resolve the correlated predicates in the clauses (e.g. WHERE or HAVING) of a
          * sub-query by using the plan the predicates should be correlated to.
          */
    -    private def resolveCorrelatedPredicates(q: LogicalPlan, p: LogicalPlan): LogicalPlan = {
    -      q transformUp {
    -        case f @ Filter(cond, child) if child.resolved && !f.resolved =>
    -          val newCond = resolveExpression(cond, p, throws = false)
    -          if (!cond.fastEquals(newCond)) {
    -            Filter(newCond, child)
    -          } else {
    -            f
    -          }
    +    private def resolveCorrelatedSubquery(
    +        sub: LogicalPlan, outer: LogicalPlan,
    +        aliases: scala.collection.mutable.Map[Attribute, Alias]): LogicalPlan = {
    +      // First resolve as much of the sub-query as possible
    +      val analyzed = execute(sub)
    +      if (analyzed.resolved) {
    +        analyzed
    +      } else {
    +        // Only resolve the lowest plan that is not resolved by outer plan, otherwise it could be
    +        // resolved by itself
    +        val resolvedByOuter = analyzed transformDown {
    +          case q: LogicalPlan if q.childrenResolved && !q.resolved =>
    +            q transformExpressions {
    +              case u @ UnresolvedAttribute(nameParts) =>
    +                withPosition(u) {
    +                  try {
    +                    val outerAttrOpt = outer.resolve(nameParts, resolver)
    +                    if (outerAttrOpt.isDefined) {
    +                      val outerAttr = outerAttrOpt.get
    +                      if (q.inputSet.contains(outerAttr)) {
    +                        // Got a conflict, create an alias for the attribute come from outer table
    +                        val alias = Alias(outerAttr, outerAttr.toString)()
    +                        val attr = alias.toAttribute
    +                        aliases += attr -> alias
    +                        attr
    +                      } else {
    +                        outerAttr
    +                      }
    +                    } else {
    +                      u
    +                    }
    +                  } catch {
    +                    case a: AnalysisException => u
    +                  }
    +                }
    +            }
    +        }
    +        if (resolvedByOuter fastEquals analyzed) {
    --- End diff --
    
    This is to end the infinite loop (recursive)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60695940
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -862,28 +862,68 @@ class Analyzer(
       object ResolveSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
         /**
    -     * Resolve the correlated predicates in the [[Filter]] clauses (e.g. WHERE or HAVING) of a
    +     * Resolve the correlated predicates in the clauses (e.g. WHERE or HAVING) of a
          * sub-query by using the plan the predicates should be correlated to.
          */
    -    private def resolveCorrelatedPredicates(q: LogicalPlan, p: LogicalPlan): LogicalPlan = {
    -      q transformUp {
    -        case f @ Filter(cond, child) if child.resolved && !f.resolved =>
    -          val newCond = resolveExpression(cond, p, throws = false)
    -          if (!cond.fastEquals(newCond)) {
    -            Filter(newCond, child)
    -          } else {
    -            f
    -          }
    +    private def resolveCorrelatedSubquery(
    +        sub: LogicalPlan, outer: LogicalPlan,
    +        aliases: scala.collection.mutable.Map[Attribute, Alias]): LogicalPlan = {
    +      // First resolve as much of the sub-query as possible
    +      val analyzed = execute(sub)
    +      if (analyzed.resolved) {
    +        analyzed
    +      } else {
    +        // Only resolve the lowest plan that is not resolved by outer plan, otherwise it could be
    +        // resolved by itself
    +        val resolvedByOuter = analyzed transformDown {
    +          case q: LogicalPlan if q.childrenResolved && !q.resolved =>
    +            q transformExpressions {
    +              case u @ UnresolvedAttribute(nameParts) =>
    +                withPosition(u) {
    +                  try {
    +                    val outerAttrOpt = outer.resolve(nameParts, resolver)
    +                    if (outerAttrOpt.isDefined) {
    +                      val outerAttr = outerAttrOpt.get
    +                      if (q.inputSet.contains(outerAttr)) {
    +                        // Got a conflict, create an alias for the attribute come from outer table
    +                        val alias = Alias(outerAttr, outerAttr.toString)()
    +                        val attr = alias.toAttribute
    +                        aliases += attr -> alias
    +                        attr
    +                      } else {
    +                        outerAttr
    +                      }
    +                    } else {
    +                      u
    +                    }
    +                  } catch {
    +                    case a: AnalysisException => u
    +                  }
    +                }
    +            }
    +        }
    +        if (resolvedByOuter fastEquals analyzed) {
    --- End diff --
    
    This is done by transform down right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-212629851
  
    **[Test build #56418 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56418/consoleFull)** for PR 12539 at commit [`e04d119`](https://github.com/apache/spark/commit/e04d1193f9a15785224722c40b3fdfae6eb6cfb4).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-212673244
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56435/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-213551488
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-213065907
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-212650937
  
    **[Test build #56435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56435/consoleFull)** for PR 12539 at commit [`c3327a9`](https://github.com/apache/spark/commit/c3327a94a5fdaa303b979c96b0bbf593fdf3e6ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-212672947
  
    **[Test build #56435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56435/consoleFull)** for PR 12539 at commit [`c3327a9`](https://github.com/apache/spark/commit/c3327a94a5fdaa303b979c96b0bbf593fdf3e6ff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12539


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-213551605
  
    Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-213065908
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56556/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60696559
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1536,17 +1560,22 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
           // Filter the plan by applying left semi and left anti joins.
           withSubquery.foldLeft(newFilter) {
    -        case (p, Exists(sub)) =>
    +        case (p, Exists(sub, _)) =>
               val (resolved, conditions) = pullOutCorrelatedPredicates(sub, p)
               Join(p, resolved, LeftSemi, conditions.reduceOption(And))
    -        case (p, Not(Exists(sub))) =>
    +        case (p, Not(Exists(sub, _))) =>
               val (resolved, conditions) = pullOutCorrelatedPredicates(sub, p)
               Join(p, resolved, LeftAnti, conditions.reduceOption(And))
             case (p, in: InSubQuery) =>
    -          val (resolved, conditions) = pullOutCorrelatedPredicates(in, p)
    -          Join(p, resolved, LeftSemi, conditions.reduceOption(And))
    +          val (newP, resolved, conditions) = pullOutCorrelatedPredicates(in, p)
    --- End diff --
    
    `newP` is `p` when we don't have to have an inmediate projection right? If it is, then we don't need the if/else block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60697650
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1536,17 +1560,22 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
           // Filter the plan by applying left semi and left anti joins.
           withSubquery.foldLeft(newFilter) {
    -        case (p, Exists(sub)) =>
    +        case (p, Exists(sub, _)) =>
               val (resolved, conditions) = pullOutCorrelatedPredicates(sub, p)
               Join(p, resolved, LeftSemi, conditions.reduceOption(And))
    -        case (p, Not(Exists(sub))) =>
    +        case (p, Not(Exists(sub, _))) =>
               val (resolved, conditions) = pullOutCorrelatedPredicates(sub, p)
               Join(p, resolved, LeftAnti, conditions.reduceOption(And))
             case (p, in: InSubQuery) =>
    -          val (resolved, conditions) = pullOutCorrelatedPredicates(in, p)
    -          Join(p, resolved, LeftSemi, conditions.reduceOption(And))
    +          val (newP, resolved, conditions) = pullOutCorrelatedPredicates(in, p)
    --- End diff --
    
    Yes, This additional Project could be removed by other rules.
    
    I thought this rule was in the last batch of optimizer, it actually is in the first batch. We may move this rule after predicate push down, then this if/else could be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60697737
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -862,28 +862,68 @@ class Analyzer(
       object ResolveSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
         /**
    -     * Resolve the correlated predicates in the [[Filter]] clauses (e.g. WHERE or HAVING) of a
    +     * Resolve the correlated predicates in the clauses (e.g. WHERE or HAVING) of a
          * sub-query by using the plan the predicates should be correlated to.
          */
    -    private def resolveCorrelatedPredicates(q: LogicalPlan, p: LogicalPlan): LogicalPlan = {
    -      q transformUp {
    -        case f @ Filter(cond, child) if child.resolved && !f.resolved =>
    -          val newCond = resolveExpression(cond, p, throws = false)
    -          if (!cond.fastEquals(newCond)) {
    -            Filter(newCond, child)
    -          } else {
    -            f
    -          }
    +    private def resolveCorrelatedSubquery(
    +        sub: LogicalPlan, outer: LogicalPlan,
    +        aliases: scala.collection.mutable.Map[Attribute, Alias]): LogicalPlan = {
    +      // First resolve as much of the sub-query as possible
    +      val analyzed = execute(sub)
    +      if (analyzed.resolved) {
    +        analyzed
    +      } else {
    +        // Only resolve the lowest plan that is not resolved by outer plan, otherwise it could be
    +        // resolved by itself
    +        val resolvedByOuter = analyzed transformDown {
    --- End diff --
    
    Ok. So `transformUp` would work if we tried to resolve using the `inner` plan first.
    
    The reason that I am complaining is here that I feel this could be a little bit more efficient, because this has to be called multiple times when we are resolving a complex correlated sub-query.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-212673241
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-213549760
  
    @hvanhovell Does this look good to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60706033
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1517,10 +1517,34 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
        */
       private def pullOutCorrelatedPredicates(
           in: InSubQuery,
    -      query: LogicalPlan): (LogicalPlan, Seq[Expression]) = {
    +      query: LogicalPlan): (LogicalPlan, LogicalPlan, Seq[Expression]) = {
         val (resolved, joinCondition) = pullOutCorrelatedPredicates(in.query, query)
    -    val conditions = joinCondition ++ in.expressions.zip(resolved.output).map(EqualTo.tupled)
    -    (resolved, conditions)
    +    // Check whether there is some attributes have same exprId but come from different side
    +    val outerAttributes = AttributeSet(in.expressions.flatMap(_.references))
    +    if (outerAttributes.intersect(resolved.outputSet).nonEmpty) {
    +      val aliases = mutable.Map[Attribute, Alias]()
    +      val exprs = in.expressions.map { expr =>
    +        expr transformUp {
    +          case a: AttributeReference if resolved.outputSet.contains(a) =>
    +            val alias = Alias(a, a.toString)()
    +            val attr = alias.toAttribute
    +            aliases += attr -> alias
    +            attr
    +        }
    +      }
    +      val newP = Project(query.output ++ aliases.values, query)
    +      val projection = resolved.output.map {
    +        case a if outerAttributes.contains(a) => Alias(a, a.toString)()
    +        case a => a
    +      }
    +      val subquery = Project(projection, resolved)
    --- End diff --
    
    So we only need to create a project here when have to alias one of the attributes. We can do this check here, and simplify the code in `apply`;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-212629397
  
    **[Test build #56418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56418/consoleFull)** for PR 12539 at commit [`e04d119`](https://github.com/apache/spark/commit/e04d1193f9a15785224722c40b3fdfae6eb6cfb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60697769
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -862,28 +862,68 @@ class Analyzer(
       object ResolveSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
         /**
    -     * Resolve the correlated predicates in the [[Filter]] clauses (e.g. WHERE or HAVING) of a
    +     * Resolve the correlated predicates in the clauses (e.g. WHERE or HAVING) of a
          * sub-query by using the plan the predicates should be correlated to.
          */
    -    private def resolveCorrelatedPredicates(q: LogicalPlan, p: LogicalPlan): LogicalPlan = {
    -      q transformUp {
    -        case f @ Filter(cond, child) if child.resolved && !f.resolved =>
    -          val newCond = resolveExpression(cond, p, throws = false)
    -          if (!cond.fastEquals(newCond)) {
    -            Filter(newCond, child)
    -          } else {
    -            f
    -          }
    +    private def resolveCorrelatedSubquery(
    +        sub: LogicalPlan, outer: LogicalPlan,
    +        aliases: scala.collection.mutable.Map[Attribute, Alias]): LogicalPlan = {
    +      // First resolve as much of the sub-query as possible
    +      val analyzed = execute(sub)
    +      if (analyzed.resolved) {
    +        analyzed
    +      } else {
    +        // Only resolve the lowest plan that is not resolved by outer plan, otherwise it could be
    +        // resolved by itself
    +        val resolvedByOuter = analyzed transformDown {
    +          case q: LogicalPlan if q.childrenResolved && !q.resolved =>
    +            q transformExpressions {
    +              case u @ UnresolvedAttribute(nameParts) =>
    +                withPosition(u) {
    +                  try {
    +                    val outerAttrOpt = outer.resolve(nameParts, resolver)
    +                    if (outerAttrOpt.isDefined) {
    +                      val outerAttr = outerAttrOpt.get
    +                      if (q.inputSet.contains(outerAttr)) {
    +                        // Got a conflict, create an alias for the attribute come from outer table
    +                        val alias = Alias(outerAttr, outerAttr.toString)()
    +                        val attr = alias.toAttribute
    +                        aliases += attr -> alias
    +                        attr
    +                      } else {
    +                        outerAttr
    +                      }
    +                    } else {
    +                      u
    +                    }
    +                  } catch {
    +                    case a: AnalysisException => u
    +                  }
    +                }
    +            }
    +        }
    +        if (resolvedByOuter fastEquals analyzed) {
    --- End diff --
    
    Nevermind you are looking for a fixedPoint here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-213065457
  
    **[Test build #56556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56556/consoleFull)** for PR 12539 at commit [`fd8c75c`](https://github.com/apache/spark/commit/fd8c75cfbc847c29ba313fce5f43903396714356).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12539#issuecomment-212629864
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60696040
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -862,28 +862,68 @@ class Analyzer(
       object ResolveSubquery extends Rule[LogicalPlan] with PredicateHelper {
     
         /**
    -     * Resolve the correlated predicates in the [[Filter]] clauses (e.g. WHERE or HAVING) of a
    +     * Resolve the correlated predicates in the clauses (e.g. WHERE or HAVING) of a
          * sub-query by using the plan the predicates should be correlated to.
          */
    -    private def resolveCorrelatedPredicates(q: LogicalPlan, p: LogicalPlan): LogicalPlan = {
    -      q transformUp {
    -        case f @ Filter(cond, child) if child.resolved && !f.resolved =>
    -          val newCond = resolveExpression(cond, p, throws = false)
    -          if (!cond.fastEquals(newCond)) {
    -            Filter(newCond, child)
    -          } else {
    -            f
    -          }
    +    private def resolveCorrelatedSubquery(
    +        sub: LogicalPlan, outer: LogicalPlan,
    +        aliases: scala.collection.mutable.Map[Attribute, Alias]): LogicalPlan = {
    +      // First resolve as much of the sub-query as possible
    +      val analyzed = execute(sub)
    +      if (analyzed.resolved) {
    +        analyzed
    +      } else {
    +        // Only resolve the lowest plan that is not resolved by outer plan, otherwise it could be
    +        // resolved by itself
    +        val resolvedByOuter = analyzed transformDown {
    --- End diff --
    
    Why `transformDown`? Wouldn't `transformUp` allow us to resolve most outerReferences in one pass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14763] [SQL] fix subquery resolution

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12539#discussion_r60765014
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1517,10 +1517,34 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
        */
       private def pullOutCorrelatedPredicates(
           in: InSubQuery,
    -      query: LogicalPlan): (LogicalPlan, Seq[Expression]) = {
    +      query: LogicalPlan): (LogicalPlan, LogicalPlan, Seq[Expression]) = {
         val (resolved, joinCondition) = pullOutCorrelatedPredicates(in.query, query)
    -    val conditions = joinCondition ++ in.expressions.zip(resolved.output).map(EqualTo.tupled)
    -    (resolved, conditions)
    +    // Check whether there is some attributes have same exprId but come from different side
    +    val outerAttributes = AttributeSet(in.expressions.flatMap(_.references))
    +    if (outerAttributes.intersect(resolved.outputSet).nonEmpty) {
    +      val aliases = mutable.Map[Attribute, Alias]()
    +      val exprs = in.expressions.map { expr =>
    +        expr transformUp {
    +          case a: AttributeReference if resolved.outputSet.contains(a) =>
    +            val alias = Alias(a, a.toString)()
    +            val attr = alias.toAttribute
    +            aliases += attr -> alias
    +            attr
    +        }
    +      }
    +      val newP = Project(query.output ++ aliases.values, query)
    +      val projection = resolved.output.map {
    +        case a if outerAttributes.contains(a) => Alias(a, a.toString)()
    +        case a => a
    +      }
    +      val subquery = Project(projection, resolved)
    --- End diff --
    
    We added a new column here, that will change the result, we have to remove it after the join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org