You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2016/02/24 07:58:26 UTC

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/11341

    [SPARK-13466][SQL] Don't introduce redundant project with colum pruning rule

    JIRA: https://issues.apache.org/jira/browse/SPARK-13466
    
    ## What changes were proposed in this pull request?
    
    With column pruning rule in optimizer, we will introduce redundant project for some cases. We should prevent it.
    
    ## How was the this patch tested?
    
    Unit test is added into ColumnPruningSuite.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 remove-redundant-project

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11341.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11341
    
----
commit 7f0dd87b1df4aac53d00d06b8f4bd0bc188d01e0
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2016-02-24T06:56:03Z

    Don't introduce redundant project with colum pruning rule.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54762019
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala ---
    @@ -249,5 +249,26 @@ class ColumnPruningSuite extends PlanTest {
         comparePlans(Optimize.execute(query), expected)
       }
     
    +  test("Remove redundant Project introduced by column pruning rule") {
    +    val input = LocalRelation('key.int, 'value.string)
    +
    +    val query =
    +      Project(Seq($"x.key", $"y.key"),
    --- End diff --
    
    This Project is not introduced by column pruning, right? Could you have a better test to match the purpose of this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188143998
  
    For an example query:
    
        val input = LocalRelation('key.int, 'value.string)
    
        val query =
          Project(Seq($"x.key", $"y.key"),
            Join(
              SubqueryAlias("x", input),
              BroadcastHint(SubqueryAlias("y", input)), Inner, None))
    
    After the first run of column pruning, it would like:
    
        Project(Seq($"x.key", $"y.key"),
          Join(
            Project(Seq($"x.key"), SubqueryAlias("x", input)),
            Project(Seq($"y.key"),      <-- inserted by the rule
            BroadcastHint(SubqueryAlias("y", input))),
            Inner, None))
    
    Actually we don't need the outside Project now. This patch will remove it:
    
        Join(
          Project(Seq($"x.key"), SubqueryAlias("x", input)),
          Project(Seq($"y.key"),
          BroadcastHint(SubqueryAlias("y", input))),
          Inner, None)
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189178948
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52030/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191125293
  
    **[Test build #52301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52301/consoleFull)** for PR 11341 at commit [`a983822`](https://github.com/apache/spark/commit/a983822a510af1c6c6579437a79c70226309173e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191350027
  
    @viirya You can change the title to "Remove redundant projects in column pruning rule"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191588620
  
    **[Test build #52361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52361/consoleFull)** for PR 11341 at commit [`2257b81`](https://github.com/apache/spark/commit/2257b8143690e9cca2da473547b6e558f87e75fa).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54710640
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -375,7 +379,7 @@ object ColumnPruning extends Rule[LogicalPlan] {
         case p @ Project(_, l: LeafNode) => p
     
         // Eliminate no-op Projects
    -    case p @ Project(projectList, child) if child.output == p.output => child
    +    case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child
    --- End diff --
    
    This LGTM, cc @davies to confirm, is there any corner case we need to think about?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188181595
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188181600
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51866/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191125616
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52301/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191637104
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191593352
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191076729
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188144342
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188181177
  
    **[Test build #51866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51866/consoleFull)** for PR 11341 at commit [`7f0dd87`](https://github.com/apache/spark/commit/7f0dd87b1df4aac53d00d06b8f4bd0bc188d01e0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191588733
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52361/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189115382
  
    **[Test build #52022 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52022/consoleFull)** for PR 11341 at commit [`7f0dd87`](https://github.com/apache/spark/commit/7f0dd87b1df4aac53d00d06b8f4bd0bc188d01e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189129858
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54833317
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala ---
    @@ -249,5 +249,26 @@ class ColumnPruningSuite extends PlanTest {
         comparePlans(Optimize.execute(query), expected)
       }
     
    +  test("Remove redundant Project introduced by column pruning rule") {
    +    val input = LocalRelation('key.int, 'value.string)
    +
    +    val query =
    +      Project(Seq($"x.key", $"y.key"),
    --- End diff --
    
    @davies I've updated the pr description and title to better match its purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54675632
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    Using `==` doesn't work here is because the output of child plan has no qualifiers, but the project output has qualifiers `x` and `y`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54676067
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    qualifiers are only for resolution, so seems it's safe to use `semanticEquals` for the above case? i.e. `case p @ Project(projectList, child)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54212491
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    yeah, they have the same effect. Do you think it is better to add this check in this pattern case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove redundant projects i...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54832315
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala ---
    @@ -249,5 +249,26 @@ class ColumnPruningSuite extends PlanTest {
         comparePlans(Optimize.execute(query), expected)
       }
     
    +  test("Remove redundant Project introduced by column pruning rule") {
    +    val input = LocalRelation('key.int, 'value.string)
    +
    +    val query =
    +      Project(Seq($"x.key", $"y.key"),
    --- End diff --
    
    Yeah, but actually the rule not only removes Project introduced by column pruning but all redundant Projects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189146392
  
    **[Test build #52030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52030/consoleFull)** for PR 11341 at commit [`61cf554`](https://github.com/apache/spark/commit/61cf5549d0a4e348643275fae4a8b4ac060c3da4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189129687
  
    **[Test build #52022 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52022/consoleFull)** for PR 11341 at commit [`7f0dd87`](https://github.com/apache/spark/commit/7f0dd87b1df4aac53d00d06b8f4bd0bc188d01e0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188147307
  
    ok. done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189129859
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52022/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191641790
  
    cc @davies Please review if the current change is appropriate. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191588731
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54072101
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    what if we use `semanticEquals` in the above case? i.e.
    ```
    def sameOutput(outpuq1, output2): Boolean = {
      output1.size == output2.size && output1.zip(output2).forall { case (a1, a2) => a1 semanticEquals a2 }
    }
    // Eliminate no-op Projects
    case p @ Project(projectList, child) if sameOutput(o.output, child.output) => child
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188115629
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51853/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191076202
  
    **[Test build #52290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52290/consoleFull)** for PR 11341 at commit [`43dd6ad`](https://github.com/apache/spark/commit/43dd6ad25cbcba3a09a25f23763b91770c459b58).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove redundant projects i...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54832817
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala ---
    @@ -249,5 +249,26 @@ class ColumnPruningSuite extends PlanTest {
         comparePlans(Optimize.execute(query), expected)
       }
     
    +  test("Remove redundant Project introduced by column pruning rule") {
    +    val input = LocalRelation('key.int, 'value.string)
    +
    +    val query =
    +      Project(Seq($"x.key", $"y.key"),
    --- End diff --
    
    Hmm, maybe I shouldn't say that removing redundant Project introduced by column pruning. It is not accurate. The Project is becoming redundant after column pruning optimizes the plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188861454
  
    **[Test build #2584 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2584/consoleFull)** for PR 11341 at commit [`7f0dd87`](https://github.com/apache/spark/commit/7f0dd87b1df4aac53d00d06b8f4bd0bc188d01e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191636079
  
    **[Test build #52372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52372/consoleFull)** for PR 11341 at commit [`2257b81`](https://github.com/apache/spark/commit/2257b8143690e9cca2da473547b6e558f87e75fa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54676258
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    yeah, I think so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54684954
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala ---
    @@ -172,7 +172,7 @@ class SQLMetricsSuite extends SparkFunSuite with SharedSQLContext {
           val df = sqlContext.sql(
             "SELECT * FROM testData2 JOIN testDataForJoin ON testData2.a = testDataForJoin.a")
           testSparkPlanMetrics(df, 1, Map(
    -        1L -> ("SortMergeJoin", Map(
    +        0L -> ("SortMergeJoin", Map(
    --- End diff --
    
    We remove redundant Project node from top of df. So change node id here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191594245
  
    **[Test build #52372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52372/consoleFull)** for PR 11341 at commit [`2257b81`](https://github.com/apache/spark/commit/2257b8143690e9cca2da473547b6e558f87e75fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54212631
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    no, I'm just wondering when we should use `==` to check 2 outputs and when we should use `semanticEquals`. Currently we use `==` in a lot of places when comparing 2 outputs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189115192
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188123311
  
    Would be great if you explain the cases that introduce redundant projects in the pr description. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188901657
  
    **[Test build #2584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2584/consoleFull)** for PR 11341 at commit [`7f0dd87`](https://github.com/apache/spark/commit/7f0dd87b1df4aac53d00d06b8f4bd0bc188d01e0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191568614
  
    **[Test build #52361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52361/consoleFull)** for PR 11341 at commit [`2257b81`](https://github.com/apache/spark/commit/2257b8143690e9cca2da473547b6e558f87e75fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188115621
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191048927
  
    **[Test build #52290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52290/consoleFull)** for PR 11341 at commit [`43dd6ad`](https://github.com/apache/spark/commit/43dd6ad25cbcba3a09a25f23763b91770c459b58).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191091158
  
    **[Test build #52301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52301/consoleFull)** for PR 11341 at commit [`a983822`](https://github.com/apache/spark/commit/a983822a510af1c6c6579437a79c70226309173e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191644061
  
    LGTM, merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11341


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188146307
  
    Can you put those in the pr description?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191142651
  
    @cloud-fan any comments for this change? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54675707
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    However, `semanticEquals` only checks if exprId is the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191125614
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191076733
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52290/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189178944
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54676381
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    Let me update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188146714
  
    **[Test build #51866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51866/consoleFull)** for PR 11341 at commit [`7f0dd87`](https://github.com/apache/spark/commit/7f0dd87b1df4aac53d00d06b8f4bd0bc188d01e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-189178054
  
    **[Test build #52030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52030/consoleFull)** for PR 11341 at commit [`61cf554`](https://github.com/apache/spark/commit/61cf5549d0a4e348643275fae4a8b4ac060c3da4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11341#discussion_r54212688
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -382,7 +382,14 @@ object ColumnPruning extends Rule[LogicalPlan] {
           val required = child.references ++ p.references
           if ((child.inputSet -- required).nonEmpty) {
             val newChildren = child.children.map(c => prunedChild(c, required))
    -        p.copy(child = child.withNewChildren(newChildren))
    +        val newChild = child.withNewChildren(newChildren)
    +        val sameOutput = newChild.output.size == p.output.size &&
    +          newChild.output.zip(p.output).forall(pair => pair._1.semanticEquals(pair._2))
    --- End diff --
    
    it will be good if we can figure out the corrected logic for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Don't introduce redundant p...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-188679245
  
    cc @davies and @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13466][SQL] Remove projects that become...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11341#issuecomment-191637110
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52372/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org