You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/11/06 16:07:12 UTC

[GitHub] spark pull request #22957: [SPARK-25951][SQL] Remove Alias when canonicalize

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/22957

    [SPARK-25951][SQL] Remove Alias when canonicalize

    ## What changes were proposed in this pull request?
    
    When we canonicalize an `Expression`, we do not remove `Alias`. So two expressions which are the same but are renamed are considered to be semantically different, while this is not true. This can cause - as showed in the JIRA - missing some optimizations, eg. removing redundant shuffles, when a column is renamed.
    
    The PR proposes to remove `Alias`es when canonicalizing an expression.
    
    ## How was this patch tested?
    
    added UT

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-25951

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22957.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22957
    
----
commit b566818bf17c2b69d06fed2e1500e608a25bb3b7
Author: Marco Gaido <ma...@...>
Date:   2018-11-06T15:55:37Z

    [SPARK-25951][SQL] Remove Alias when canonicalize

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99518/testReport)** for PR 22957 at commit [`6eee1e4`](https://github.com/apache/spark/commit/6eee1e4235633284a0824b500d6b95cedbce7b0b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99522/testReport)** for PR 22957 at commit [`5bca5e3`](https://github.com/apache/spark/commit/5bca5e348f9ee6d1f00a5f68666e991160498ed7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237536540
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2542,10 +2542,10 @@ object EliminateUnions extends Rule[LogicalPlan] {
      * rule can't work for those parameters.
      */
     object CleanupAliases extends Rule[LogicalPlan] {
    -  private def trimAliases(e: Expression): Expression = {
    +  private[catalyst] def trimAliases(e: Expression): Expression = {
         e.transformDown {
    -      case Alias(child, _) => child
    -      case MultiAlias(child, _) => child
    +      case Alias(child, _) => trimAliases(child)
    --- End diff --
    
    ah, I did a stupid thing here. So the problem is that: since it returns `child` for `this`, in transformDown we apply the rule to `child` children, instead of applying to `child` itself. So the problem here is with 2 consecutive `Alias`. Let me find a better fix.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99377/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238027941
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext {
             classOf[PartitioningCollection])
         }
       }
    +
    +  test("SPARK-25951: avoid redundant shuffle on rename") {
    --- End diff --
    
    I think the test case coverage is not enough for such massive changes. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238524763
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    As an example, `ProjectExec.outputPartitioning` can be wrong, as it doesn't consider the aliases in the project list. I think it's clearer to adjust the `outputPartitioning` there, instead of dealing with it in a rule. What if we have more rules need to check `outputPartitioning` and `requiredChildDistribution`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5453/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238650207
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    seems we are not on the same page...
    
    Let's make the example clearer. Assuming a `relation[a ,b]`'s partitioning is `hash(a, b)`, then `Project(a as c, a as d, b, relation)`'s partitioning should be `[hash(c, b), hash(d, b)]`. It's like a flatMap.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99840/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99377/testReport)** for PR 22957 at commit [`6c93e70`](https://github.com/apache/spark/commit/6c93e708df203cc5a2f3085340ed61bf5c8d90fe).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99713/testReport)** for PR 22957 at commit [`e4f617f`](https://github.com/apache/spark/commit/e4f617fc7e47d7c49f3d773ac2d91c5508c0a239).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5672/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Remove Alias when canonicalize

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98521/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Remove Alias when canonicalize

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99376/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99618/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4817/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237787920
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,35 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
    +   *
    +   * This method should be used (instead of `sameResult`) when comparing if 2 expressions are the
    +   * same and one can replace the other (eg. in Optimizer/Analyzer rules where we want to replace
    +   * equivalent expressions). It should not be used (and `sameResult` should be used instead) when
    +   * comparing if 2 expressions produce the same results (in this case `semanticEquals` can be too
    +   * strict).
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    --- End diff --
    
    yes, I mean: `sameResult` returns true if 2 expressions return the same data even though from plan perspective they are not the same (eg. the output name/exprIds is different as in this case), while `semanticEquals` ensure they are the same from plan perspective too. If you have better suggestions how to rephrase this, I am happy to improve it. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99449/testReport)** for PR 22957 at commit [`a306465`](https://github.com/apache/spark/commit/a306465d3fc0ef2c8ce43cfd14277d3aab4550e2).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238291824
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -223,14 +223,35 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
    +   *
    +   * This method should be used (instead of `sameResult`) when comparing if 2 expressions are the
    +   * same and one can replace the other (eg. in Optimizer/Analyzer rules where we want to replace
    +   * equivalent expressions). It should not be used (and `sameResult` should be used instead) when
    +   * comparing if 2 expressions produce the same results (in this case `semanticEquals` can be too
    +   * strict).
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output from
    +   * plan perspective may be different, because of different names or similar differences.
    +   * Usually this means that their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   *
    +   * This method should be used (instead of `semanticEquals`) when checking if 2 expressions
    +   * produce the same results (eg. as in the case we are interested to check if the ordering is the
    +   * same). It should not be used (and `semanticEquals` should be used instead) when comparing if 2
    +   * expressions are the same and one can replace the other.
    +   */
    +  final def sameResult(other: Expression): Boolean =
    --- End diff --
    
    thanks, added


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237075431
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    +    case a: Alias => sameResult(a.child)
    +    case _ => this.semanticEquals(other)
    --- End diff --
    
    well, it needs to be overridden by `HashPartitioning` too, so since I am not able to make it final anyway, I don't think it is a good idea. Well, I can add a match on `HashPartitioning`too, but it doesn't seem a clean solution to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Remove Alias when canonicalize

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #98521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98521/testReport)** for PR 22957 at commit [`b566818`](https://github.com/apache/spark/commit/b566818bf17c2b69d06fed2e1500e608a25bb3b7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5520/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99518/testReport)** for PR 22957 at commit [`6eee1e4`](https://github.com/apache/spark/commit/6eee1e4235633284a0824b500d6b95cedbce7b0b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99522/testReport)** for PR 22957 at commit [`5bca5e3`](https://github.com/apache/spark/commit/5bca5e348f9ee6d1f00a5f68666e991160498ed7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99517/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99462/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99446/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99424/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5517/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    i didn't look at your new code, but is your old code safe? e.g. a project that depends on the new alias.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238634730
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    For `Project(a as c, a as d, b, relation)`, I think the `outputPartitioning` should be `[hash part c, hash part d, hash part b]`. The point is, we should not report an output partitioning whose attribute is not even in the current plan's output.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99448/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #98552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98552/testReport)** for PR 22957 at commit [`2b00f35`](https://github.com/apache/spark/commit/2b00f351fbb8f33cf01b30297f3785956b1b3697).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99448/testReport)** for PR 22957 at commit [`a306465`](https://github.com/apache/spark/commit/a306465d3fc0ef2c8ce43cfd14277d3aab4550e2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237065506
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    --- End diff --
    
    "erase the name" can also mean remove `Alias`. If we can't clearly tell the difference between `semanticEquals` and `sameResult`, and give a guideline about using which one in which case, I think we should just update `semanticEquals`(i.e. `Canonicalize`).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237519971
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2542,10 +2542,10 @@ object EliminateUnions extends Rule[LogicalPlan] {
      * rule can't work for those parameters.
      */
     object CleanupAliases extends Rule[LogicalPlan] {
    -  private def trimAliases(e: Expression): Expression = {
    +  private[catalyst] def trimAliases(e: Expression): Expression = {
         e.transformDown {
    -      case Alias(child, _) => child
    -      case MultiAlias(child, _) => child
    +      case Alias(child, _) => trimAliases(child)
    --- End diff --
    
    what's going on here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237818279
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,35 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
    +   *
    +   * This method should be used (instead of `sameResult`) when comparing if 2 expressions are the
    +   * same and one can replace the other (eg. in Optimizer/Analyzer rules where we want to replace
    +   * equivalent expressions). It should not be used (and `sameResult` should be used instead) when
    +   * comparing if 2 expressions produce the same results (in this case `semanticEquals` can be too
    +   * strict).
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    --- End diff --
    
    How about replace `output` with `output from plan perspective`?
    
    ```
    Returns true when two expressions will always compute the same result, even if the output
    from plan perspective may be different, because of different names or similar differences.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99517/testReport)** for PR 22957 at commit [`5c6b9fc`](https://github.com/apache/spark/commit/5c6b9fc52d825c5c6c680d38f8ef8c621dded964).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99375/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99377/testReport)** for PR 22957 at commit [`6c93e70`](https://github.com/apache/spark/commit/6c93e708df203cc5a2f3085340ed61bf5c8d90fe).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    @cloud-fan @gatorsmile I updated the PR according to the previous suggestions and added a new dedicated test suite.
    May you please review this again? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5496/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99517/testReport)** for PR 22957 at commit [`5c6b9fc`](https://github.com/apache/spark/commit/5c6b9fc52d825c5c6c680d38f8ef8c621dded964).
     * This patch **fails Scala style tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] with PredicateHelper `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99462/testReport)** for PR 22957 at commit [`13aef71`](https://github.com/apache/spark/commit/13aef71df5662f4f8389ece87c4270597de93db5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    cc @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238460901
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    this is not dealing with the aliases in the `outputPartitioning` but with the ones in the `requiredChildDistribution`. Anyway, I wouldn't do it there, because this would mean moving also the logic for collecting the aliases from the children there, which seems to me an operations which belong to a rule/transforming operator, rather than to the plan operator itself (eg. now these methods are in `PredicateHelper`...).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237887275
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext {
             classOf[PartitioningCollection])
         }
       }
    +
    +  test("SPARK-25951: avoid redundant shuffle on rename") {
    --- End diff --
    
    ah, good point and indeed very useful. In my previous tests I always used a very simple query to verify this and never the one reported in the JIRA. Now I tried that one and I realized that this fix is not very useful as of now, because in renaming like that in the `HashPatitioning` there is the `AttributeReference` to the `Alias`, rather than the `Alias` itself. Since that is the common case, the PR as it is now it is not very useful. If I won't be able to figure out a good way for that, I am going to close this. Thanks and sorry for the trouble.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5451/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237073129
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    +    case a: Alias => sameResult(a.child)
    +    case _ => this.semanticEquals(other)
    --- End diff --
    
    I think it is doable, but I didn't want to put too many `match` where it was not needed. But if you prefer that way I can try and do that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5581/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99682/testReport)** for PR 22957 at commit [`bf1d04a`](https://github.com/apache/spark/commit/bf1d04a819855737d1096b61b1c3d46010f50dee).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99618/testReport)** for PR 22957 at commit [`1f797df`](https://github.com/apache/spark/commit/1f797dff9c28adccec3f32acaf361e3b94596ba0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99713/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5530/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Remove Alias when canonicalize

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4798/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5919/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237749287
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext {
             classOf[PartitioningCollection])
         }
       }
    +
    +  test("SPARK-25951: avoid redundant shuffle on rename") {
    --- End diff --
    
    +1 if possible.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237049166
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    --- End diff --
    
    I know it's always safer to introduce a new API, does is it really necessary? In `Canonicalize`, we erase the name for attributes, I think it's reasonable to erase the name of `Alias`, as it doesn't affect the output.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238583330
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    But `ProjectExec.outputPartitioning` cannot contain a reference to the aliases in its project list, as its output partitioning is the one of the child, where that alias doesn't exist.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99834/testReport)** for PR 22957 at commit [`bca3e87`](https://github.com/apache/spark/commit/bca3e873b3193bcccd68b4e042fec5c985718d5b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237069486
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    --- End diff --
    
    can you put it in the method doc(both `semanticEquals` and `sameResult`)? This makes sense to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    This looks good to me. Just a comment about wording.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5858/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99376/testReport)** for PR 22957 at commit [`3831be0`](https://github.com/apache/spark/commit/3831be0aba7eaf83bb03152de129c11286290586).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99618/testReport)** for PR 22957 at commit [`1f797df`](https://github.com/apache/spark/commit/1f797dff9c28adccec3f32acaf361e3b94596ba0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Btw, I think we can update the PR title and description to reflect new changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99446/testReport)** for PR 22957 at commit [`a306465`](https://github.com/apache/spark/commit/a306465d3fc0ef2c8ce43cfd14277d3aab4550e2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99449/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99713/testReport)** for PR 22957 at commit [`e4f617f`](https://github.com/apache/spark/commit/e4f617fc7e47d7c49f3d773ac2d91c5508c0a239).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238321460
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    instead of doing it here, shall we deal with alias in `SparkPlan.outputPartitioning` for operators with a project list?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238459238
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    --- End diff --
    
    I think it is. We are only checking the presence of aliases. In particular, we are collecting all the aliases which are defined in the previous operator. The solution you are suggesting works too IMHO and restricts the scope, but I am not sure it is a good thing, because I see no harm in doing it for other operators: simply they won't contain aliases; while I do see some issues in the maintenance of the "whitelist" of operators you are suggesting (we may miss some now or forget to update later...)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99375/testReport)** for PR 22957 at commit [`0491249`](https://github.com/apache/spark/commit/049124971eb9d0f84eb98cc33c4cccd65d7a42b7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99906/testReport)** for PR 22957 at commit [`0f68a41`](https://github.com/apache/spark/commit/0f68a41beb15da35c6839ecef18ec69d29e133dc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237068718
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    --- End diff --
    
    remove `Alias` is not possible for the reason explained in https://github.com/apache/spark/pull/22957#issuecomment-436992955. In general, `semanticEquals` should be used when we want to replace an expression with another, while `sameResult` should be used in order to check that 2 expressions return the same output.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99462/testReport)** for PR 22957 at commit [`13aef71`](https://github.com/apache/spark/commit/13aef71df5662f4f8389ece87c4270597de93db5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237062190
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    --- End diff --
    
    that is reasonable but it doesn't solve the problem stated in the JIRA. So the goal here is to avoid that something like `a as b` is considered different from `a` in terms of ordering/distribution. If we just erase the name of alias, the 2 expression would still be different because of the presence of `Alias` itself would make the 2 expressions different.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99840/testReport)** for PR 22957 at commit [`0f68a41`](https://github.com/apache/spark/commit/0f68a41beb15da35c6839ecef18ec69d29e133dc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Thanks for you comment @rxin. It was safe for comparisons (I mean to say: this 2 expressions return the same data), because anyway all the `AttributeReference`s contain the `exprId` they referred to, so a removed `Alias` would have had its exprId in all the `AttributeReference`s to it. But if this was used for checking which expressions to replace and modifying the plan (as it is done in the `PhysicalAggregation`), then it is not safe, because it can cause missing some `Alias` from the resulting plan, leading to an invalid one. So I can say in general it was not safe if we consider all the usages of `semanticEquals`, but it is safe if we want to know whether the returned data is the same. Hope this answer is clear enough.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238696879
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    Ok, I think I got it now, sorry, I didn't understand :) yes I think this is doable then. Let me try and do that, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99682/testReport)** for PR 22957 at commit [`bf1d04a`](https://github.com/apache/spark/commit/bf1d04a819855737d1096b61b1c3d46010f50dee).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99834/testReport)** for PR 22957 at commit [`bca3e87`](https://github.com/apache/spark/commit/bca3e873b3193bcccd68b4e042fec5c985718d5b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238630487
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    ah I see now what you mean. I am not sure what you are suggesting is feasible. Imagine that in your example the Project is: `Project(a as c, a as d, b, relation)`. What should the output partitioning be?
    
    > What do you mean by ...
    
    I meant that when we collect the alias for `a as c`, we are mapping all the attr references of `c` with `a as c` here. In the `outputPartitioning`, there will never be an occurrence of a reference to `c`, but only references to `a`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99522/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237745005
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext {
             classOf[PartitioningCollection])
         }
       }
    +
    +  test("SPARK-25951: avoid redundant shuffle on rename") {
    --- End diff --
    
    can we have an end-to-end test as well?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99906/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237747550
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,35 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
    +   *
    +   * This method should be used (instead of `sameResult`) when comparing if 2 expressions are the
    +   * same and one can replace the other (eg. in Optimizer/Analyzer rules where we want to replace
    +   * equivalent expressions). It should not be used (and `sameResult` should be used instead) when
    +   * comparing if 2 expressions produce the same results (in this case `semanticEquals` can be too
    +   * strict).
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    --- End diff --
    
    I think here `output` is a bit confusing. Do we mean the output names?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5586/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237905754
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext {
             classOf[PartitioningCollection])
         }
       }
    +
    +  test("SPARK-25951: avoid redundant shuffle on rename") {
    --- End diff --
    
    @cloud-fan @viirya I added the test, but as I mentioned I had to do another change in order to make it working. Sorry for the mistake. I'd really appreciate if you could review it again. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5759/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Remove Alias when canonicalize

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #98521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98521/testReport)** for PR 22957 at commit [`b566818`](https://github.com/apache/spark/commit/b566818bf17c2b69d06fed2e1500e608a25bb3b7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237532156
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2542,10 +2542,10 @@ object EliminateUnions extends Rule[LogicalPlan] {
      * rule can't work for those parameters.
      */
     object CleanupAliases extends Rule[LogicalPlan] {
    -  private def trimAliases(e: Expression): Expression = {
    +  private[catalyst] def trimAliases(e: Expression): Expression = {
         e.transformDown {
    -      case Alias(child, _) => child
    -      case MultiAlias(child, _) => child
    +      case Alias(child, _) => trimAliases(child)
    --- End diff --
    
    it's `transformDown`, why doesn't it work?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5452/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238292468
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -780,6 +780,23 @@ class PlannerSuite extends SharedSQLContext {
             classOf[PartitioningCollection])
         }
       }
    +
    +  test("SPARK-25951: avoid redundant shuffle on rename") {
    --- End diff --
    
    thanks @gatorsmile, I added also a negative case, but I don't think it is enough. Do you have some hints on cases to test? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238028577
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -223,14 +223,35 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
    +   *
    +   * This method should be used (instead of `sameResult`) when comparing if 2 expressions are the
    +   * same and one can replace the other (eg. in Optimizer/Analyzer rules where we want to replace
    +   * equivalent expressions). It should not be used (and `sameResult` should be used instead) when
    +   * comparing if 2 expressions produce the same results (in this case `semanticEquals` can be too
    +   * strict).
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output from
    +   * plan perspective may be different, because of different names or similar differences.
    +   * Usually this means that their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   *
    +   * This method should be used (instead of `semanticEquals`) when checking if 2 expressions
    +   * produce the same results (eg. as in the case we are interested to check if the ordering is the
    +   * same). It should not be used (and `semanticEquals` should be used instead) when comparing if 2
    +   * expressions are the same and one can replace the other.
    +   */
    +  final def sameResult(other: Expression): Boolean =
    --- End diff --
    
    Add unit test cases to SubexpressionEliminationSuite?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237070739
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    +    case a: Alias => sameResult(a.child)
    +    case _ => this.semanticEquals(other)
    --- End diff --
    
    can we also strip the alias of this here? so that we can mark `sameResult` as final.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238642801
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    I don't think that is right: that would cause the shuffle to happen for every plan which is hashed by both `[hash part c, hash part b]` and `[hash part d, hash part b]` (and also `[hash part a, hash part b]`). I think that if we want to go that way, we'd need a set of equivalent outputPatitioning


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238626367
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    +      case a: Alias => (a.toAttribute, a)
    +    }))
    +
         // Ensure that the operator's children satisfy their output distribution requirements.
         children = children.zip(requiredChildDistributions).map {
    -      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
    +      case (child, distribution) if child.outputPartitioning.satisfies(
    +          distribution.mapExpressions(replaceAlias(_, aliasMap))) =>
    --- End diff --
    
    for example, `relation[a, b]`'s output partitioning is `[hash partition a, hash partition b]`, and `Project(a as c, b, relation)`'s output partitioning should be `[hash partition c, hash partition b]`.
    
    What do you mean by `But ProjectExec.outputPartitioning cannot contain a reference to the aliases`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99906/testReport)** for PR 22957 at commit [`0f68a41`](https://github.com/apache/spark/commit/0f68a41beb15da35c6839ecef18ec69d29e133dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Remove Alias when canonicalize

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #98552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98552/testReport)** for PR 22957 at commit [`2b00f35`](https://github.com/apache/spark/commit/2b00f351fbb8f33cf01b30297f3785956b1b3697).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98552/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99375/testReport)** for PR 22957 at commit [`0491249`](https://github.com/apache/spark/commit/049124971eb9d0f84eb98cc33c4cccd65d7a42b7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99834/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99449/testReport)** for PR 22957 at commit [`a306465`](https://github.com/apache/spark/commit/a306465d3fc0ef2c8ce43cfd14277d3aab4550e2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99518/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237747770
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,35 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
    +   *
    +   * This method should be used (instead of `sameResult`) when comparing if 2 expressions are the
    +   * same and one can replace the other (eg. in Optimizer/Analyzer rules where we want to replace
    +   * equivalent expressions). It should not be used (and `sameResult` should be used instead) when
    +   * comparing if 2 expressions produce the same results (in this case `semanticEquals` can be too
    +   * strict).
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    --- End diff --
    
    So sameResult returns if the evaluated results between two expressions are exactly the same?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237526646
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2542,10 +2542,10 @@ object EliminateUnions extends Rule[LogicalPlan] {
      * rule can't work for those parameters.
      */
     object CleanupAliases extends Rule[LogicalPlan] {
    -  private def trimAliases(e: Expression): Expression = {
    +  private[catalyst] def trimAliases(e: Expression): Expression = {
         e.transformDown {
    -      case Alias(child, _) => child
    -      case MultiAlias(child, _) => child
    +      case Alias(child, _) => trimAliases(child)
    --- End diff --
    
    the point is that now this method removes only the first `Alias` it finds (and it doesn't go on recursively), which is the reason of the UT failure. Also checking the comment on the method it seems not the expected behavior of this method.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    LGTM, cc @viirya as well


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237076213
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    +    case a: Alias => sameResult(a.child)
    +    case _ => this.semanticEquals(other)
    --- End diff --
    
    we can do
    ```
    CleanupAliases.trimAliases(this) semanticEquals CleanupAliases.trimAliases(other)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99376/testReport)** for PR 22957 at commit [`3831be0`](https://github.com/apache/spark/commit/3831be0aba7eaf83bb03152de129c11286290586).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5864/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99840/testReport)** for PR 22957 at commit [`0f68a41`](https://github.com/apache/spark/commit/0f68a41beb15da35c6839ecef18ec69d29e133dc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5582/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5735/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99446/testReport)** for PR 22957 at commit [`a306465`](https://github.com/apache/spark/commit/a306465d3fc0ef2c8ce43cfd14277d3aab4550e2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99424 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99424/testReport)** for PR 22957 at commit [`6c93e70`](https://github.com/apache/spark/commit/6c93e708df203cc5a2f3085340ed61bf5c8d90fe).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237539036
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2542,10 +2542,10 @@ object EliminateUnions extends Rule[LogicalPlan] {
      * rule can't work for those parameters.
      */
     object CleanupAliases extends Rule[LogicalPlan] {
    -  private def trimAliases(e: Expression): Expression = {
    +  private[catalyst] def trimAliases(e: Expression): Expression = {
         e.transformDown {
    -      case Alias(child, _) => child
    -      case MultiAlias(child, _) => child
    +      case Alias(child, _) => trimAliases(child)
    --- End diff --
    
    just using `transformUp`solves the issue


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99682/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99448/testReport)** for PR 22957 at commit [`a306465`](https://github.com/apache/spark/commit/a306465d3fc0ef2c8ce43cfd14277d3aab4550e2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r238318256
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ---
    @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] {
         assert(requiredChildDistributions.length == children.length)
         assert(requiredChildOrderings.length == children.length)
     
    +    val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect {
    --- End diff --
    
    is it safe to do so? I think we should only collect aliases from operators with a project list, i.e. `Project`, `Aggregate`, `Window`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22957#discussion_r237070496
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -195,14 +195,26 @@ abstract class Expression extends TreeNode[Expression] {
       }
     
       /**
    -   * Returns true when two expressions will always compute the same result, even if they differ
    +   * Returns true when two expressions will always compute the same output, even if they differ
        * cosmetically (i.e. capitalization of names in attributes may be different).
        *
        * See [[Canonicalize]] for more details.
        */
       def semanticEquals(other: Expression): Boolean =
         deterministic && other.deterministic && canonicalized == other.canonicalized
     
    +  /**
    +   * Returns true when two expressions will always compute the same result, even if the output may
    +   * be different, because of different names or similar differences.
    +   * Usually this means they their canonicalized form equals, but it may also not be the case, as
    +   * different output expressions can evaluate to the same result as well (eg. when an expression
    +   * is aliased).
    +   */
    +  def sameResult(other: Expression): Boolean = other match {
    --- End diff --
    
    Sure, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22957
  
    **[Test build #99424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99424/testReport)** for PR 22957 at commit [`6c93e70`](https://github.com/apache/spark/commit/6c93e708df203cc5a2f3085340ed61bf5c8d90fe).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org