You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2015/10/30 23:26:07 UTC

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/9385

    [SPARK-11433] [SQL] Cleanup the subquery name after eliminating subquery

    This fix is to remove the subquery name in qualifiers after eliminating subquery.  

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark eliminateSubQ

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9385
    
----
commit db69ccf2c60679c4ca111a618190258d5b5cef62
Author: Xiao Li <xi...@xiaos-imac.local>
Date:   2015-10-30T22:12:19Z

    cleanup the subquery name after eliminating subquery

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153620226
  
    @dbtsai Thank you! 
    
    Please let me know if you need any extra code change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153554640
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9385#discussion_r43559826
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1019,7 +1019,16 @@ class Analyzer(
      * scoping information for attributes and can be removed once analysis is complete.
      */
     object EliminateSubQueries extends Rule[LogicalPlan] {
    -  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
    +    case Project(projectList, child: Subquery) => {
    +      Project(
    +        projectList.flatMap {
    +          case ar: AttributeReference if ar.qualifiers.contains(child.alias) =>
    --- End diff --
    
    Should I use NamedExpression to replace AttributeReference? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153554643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44981/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-152708351
  
    So far, I just observed this strange ghosting values when I read the optimized logical tree, but my query did not trigger any issue. 
    
    Based on my understanding, usage of qualifiers is still limited in the current code base. It could be a potential issue when we support more complex SQL syntax/functions. Thus, I submitted this pull request to resolve this issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-152699107
  
    Logically when we remove `Subquery`, we should remove related `qualifiers`, however, can you construct a case that hit problems because of not removing `qualifiers`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-156612181
  
    Hi, @marmbrus 
    
    Originally, I thought quantifiers are part of identifiers, like schema name in traditional RDBMS. Based on your explanation, this is not true. 
    
    I did a code change. Please check if the latest changes make sense. `semanticEquals` is used. Now, all the test cases passed. https://github.com/gatorsmile/spark/commit/8e72b17561e4cc1a6cce86fc70f6ed968ebf5b38
    
    Just did a merge to the latest master. Thank you for your time.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-154640515
  
    `Expression` instances have two different kinds of equality:
     - `equals` full equality, i.e. capitalization, qualifiers, etc.  This is important for example to know if a tree transformation has reached fixed point.
     - `semanticEquals` equivalent in execution (i.e. will return the same result)
    
    If you ran into problems adding qualifiers to `equals` then I think it is likely that that code should actually be using `semanticEquals`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153536438
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-155605985
  
    @marmbrus 
    
    After rechecking the root reason why Expand failed, I still think we should cleanup the subquery name after subquery elimination. My current fix needs a change to enable a deeper clean of subquery. 
    
    Let me explain what happened in Expand. 
    
    Before subquery elimination, the subquery name "mytable" is shown in all the three upper levels (Aggregate, Expand and Project). 
    ```scala
    Aggregate [(a#2 + b#3)#7,b#3,grouping__id#6], [(a#2 + b#3)#7 AS _c0#4,b#3,sum(cast((a#2 - b#3) as bigint)) AS _c2#5L]
     Expand [0,1,3], [(a#2 + b#3)#7,b#3], grouping__id#6
      Project [a#2,b#3,(a#2 + b#3) AS (a#2 + b#3)#7]
       Subquery mytable
        Project [_1#0 AS a#2,_2#1 AS b#3]
         LocalRelation [_1#0,_2#1], [[1,2],[2,4]]
    ```
    
    After subquery elimination, the subquery name "mytable" is not removed in these three levels. 
    
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-155677775
  
    Hi, @marmbrus 
    
    After digging the root reason why Expand cases failed, I found we  still need a deeper clean of subquery after elimination. 
    
    Let me use the following example to explain what happened in Expand. This query works well if we do not compare the qualifiers when comparing two AttributeReferences. I think this is a bug if merging https://github.com/apache/spark/pull/9216, right? 
    
    ```scala
    val sqlDF = sql("select a, b, sum(a) from mytable group by a, b with rollup").explain(true)
    ```
    
    Before subquery elimination, the subquery name "mytable" is shown in all the two upper layers (Aggregate and Expand). 
    ```scala
    Aggregate [a#2,b#3,grouping__id#5], [a#2,b#3,sum(cast(a#2 as bigint)) AS _c2#4L]
     Expand [0,1,3], [a#2,b#3], grouping__id#5
      Subquery mytable
       Project [_1#0 AS a#2,_2#1 AS b#3]
        LocalRelation [_1#0,_2#1], [[1,2],[2,4]]
    ```
    
    After subquery elimination, the subquery name "mytable" is not removed in these two upper layers. 
    ```scala
    Aggregate [a#2,b#3,grouping__id#5], [a#2,b#3,sum(cast(a#2 as bigint)) AS _c2#4L]
     Expand [0,1,3], [a#2,b#3], grouping__id#5
      Project [_1#0 AS a#2,_2#1 AS b#3]
       LocalRelation [_1#0,_2#1], [[1,2],[2,4]]
    ```
    
    In SparkStrategies, we create an array of Projections for the child projection of Expand.
    
    ```scala
          case e @ logical.Expand(_, _, _, child) =>
            execution.Expand(e.projections, e.output, planLater(child)) :: Nil
    ```
    
    `e.projections` calls the function `expand()`. Inside the function `expand()`, I do not think we should use `semanticEquals` there. 
    
    Let me post the incorrect physical plan
    
    ```scala
    TungstenAggregate(key=[a#2,b#3,grouping__id#12], functions=[(sum(cast(a#2 as bigint)),mode=Final,isDistinct=false)], output=[a#2,b#3,_c2#11L])
     TungstenExchange hashpartitioning(a#2,b#3,grouping__id#12,5)
      TungstenAggregate(key=[a#2,b#3,grouping__id#12], functions=[(sum(cast(a#2 as bigint)),mode=Partial,isDistinct=false)], output=[a#2,b#3,grouping__id#12,currentSum#15L])
       Expand [List(a#2, b#3, 0),List(a#2, b#3, 1),List(a#2, b#3, 3)], [a#2,b#3,grouping__id#12]
        LocalTableScan [a#2,b#3], [[1,2],[2,4]]
    ```
    
    For you convenience, below is the correct one:
    
    ```scala
    TungstenAggregate(key=[a#2,b#3,grouping__id#12], functions=[(sum(cast(a#2 as bigint)),mode=Final,isDistinct=false)], output=[a#2,b#3,_c2#11L])
     TungstenExchange hashpartitioning(a#2,b#3,grouping__id#12,5)
      TungstenAggregate(key=[a#2,b#3,grouping__id#12], functions=[(sum(cast(a#2 as bigint)),mode=Partial,isDistinct=false)], output=[a#2,b#3,grouping__id#12,currentSum#15L])
       Expand [List(null, null, 0),List(a#2, null, 1),List(a#2, b#3, 3)], [a#2,b#3,grouping__id#12]
        LocalTableScan [a#2,b#3], [[1,2],[2,4]]
    ```
    
    My current fix does not fix this issue yet. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153536456
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153554468
  
    **[Test build #44981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44981/consoleFull)** for PR 9385 at commit [`a26763d`](https://github.com/apache/spark/commit/a26763d758bc58dacf81be171428ede215775532).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153529973
  
    @cloud-fan @dbtsai , Jenkins did not start the testing. Could you let Jenkins to test it? 
    
    Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
GitHub user gatorsmile reopened a pull request:

    https://github.com/apache/spark/pull/9385

    [SPARK-11433] [SQL] Cleanup the subquery name after eliminating subquery

    This fix is to remove the subquery name in qualifiers after eliminating subquery.  

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark eliminateSubQ

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9385
    
----
commit db69ccf2c60679c4ca111a618190258d5b5cef62
Author: Xiao Li <xi...@xiaos-imac.local>
Date:   2015-10-30T22:12:19Z

    cleanup the subquery name after eliminating subquery

commit a26763d758bc58dacf81be171428ede215775532
Author: gatorsmile <ga...@gmail.com>
Date:   2015-11-04T00:04:31Z

    flatmap->map

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-152669562
  
    Jenkins, okay to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-154650945
  
    @marmbrus Thanks! 
    
    I will try to change equals to semanticEquals in the pull request https://github.com/apache/spark/pull/9216. Then, you can decide if this is a right solution. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-152664745
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9385#discussion_r43826123
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1019,7 +1019,16 @@ class Analyzer(
      * scoping information for attributes and can be removed once analysis is complete.
      */
     object EliminateSubQueries extends Rule[LogicalPlan] {
    -  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
    +    case Project(projectList, child: Subquery) => {
    +      Project(
    +        projectList.flatMap {
    --- End diff --
    
    Thank you! I did the change based on your suggestion. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/9385


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9385#discussion_r43608157
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1019,7 +1019,16 @@ class Analyzer(
      * scoping information for attributes and can be removed once analysis is complete.
      */
     object EliminateSubQueries extends Rule[LogicalPlan] {
    -  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
    +    case Project(projectList, child: Subquery) => {
    +      Project(
    +        projectList.flatMap {
    --- End diff --
    
    Could we simplify this to a map and remove the :: Nil we have in the two sub cases? since it seems we are always returning a single element list for every case so it 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-156238962
  
    I'm sorry, I don't see where this `expand()` function you are talking about is, or why it should not use `semanticEquals`.  The whole point of `semanticEquals` is "are these two attributes referring to the same data, and in many cases this is actually what you want instead of structural equality (structural equality matters, for example, when you are checking to see if a tree has reached fixed point.  otherwise we could not use rules to make changes to Attributes that are cosmetic in nature).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153536164
  
    I think there is some issue in Jenkins. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-154609690
  
    @marmbrus I already hit this issue when resolving https://issues.apache.org/jira/browse/SPARK-8658. That means, when comparing two AttributeReferences, we should not compare their qualifiers. That looks a strange fix, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-154594596
  
    Thanks for your contribution, but I'm tempted to not make this change unless there is actually a bug.  We are eliminating the subqueries because they will impact optimization and planning.  However, keeping the qualifiers around could actually be useful if we want to give better error messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9385#discussion_r43817697
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1019,7 +1019,16 @@ class Analyzer(
      * scoping information for attributes and can be removed once analysis is complete.
      */
     object EliminateSubQueries extends Rule[LogicalPlan] {
    -  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
    --- End diff --
    
    Thank you for your comments! If we doing transformUp, subquery will be removed at first. Then, Project(projectList, child: Subquery) will not be applicable in this case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-156730810
  
    @marmbrus CachedTableSuite failed due to the same reason. We did not clean up the subquery names. Thus, it is unable to give a correct result when deciding if Exchange is needed. 
    
    I did the fix by using `semanticEquals`. Please check if the changes appropriate. https://github.com/apache/spark/pull/9216
    
    Now, all the test cases passed. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9385#discussion_r43608456
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1019,7 +1019,16 @@ class Analyzer(
      * scoping information for attributes and can be removed once analysis is complete.
      */
     object EliminateSubQueries extends Rule[LogicalPlan] {
    -  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
    --- End diff --
    
    Any particular reason for switching to transformDown?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153536898
  
    **[Test build #44981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44981/consoleFull)** for PR 9385 at commit [`a26763d`](https://github.com/apache/spark/commit/a26763d758bc58dacf81be171428ede215775532).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9385#issuecomment-153536227
  
    Jenkins, add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org