You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2017/02/14 21:20:54 UTC

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/16933

    [SPARK-19601] [SQL] Fix CollapseRepartition rule to preserve shuffle-enabled Repartition

    ### What changes were proposed in this pull request?
    When users use the shuffle-enabled `repartition` API, they expect the partition they got should be the exact number they provided, even if they call shuffle-disabled `coalesce` later. Currently, `CollapseRepartition` rule does not consider whether shuffle is enabled or not. Thus, we got the following unexpected result.
    
    ```Scala
        val df = spark.range(0, 10000, 1, 5)
        val df2 = df.repartition(10)
        assert(df2.coalesce(13).rdd.getNumPartitions == 5)
        assert(df2.coalesce(7).rdd.getNumPartitions == 5)
        assert(df2.coalesce(3).rdd.getNumPartitions == 3)
    ```
    
    This PR is to fix the issue. We preserve shuffle-enabled Repartition. 
    
    ### How was this patch tested?
    Added a test case

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark CollapseRepartition

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16933.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16933
    
----
commit 7b4a9dd61d44aa10733f643381db014cf49f9d5d
Author: Xiao Li <ga...@gmail.com>
Date:   2017-02-14T21:11:39Z

    fix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104772991
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,47 +32,180 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    // Always respects the top coalesces amd removes useless coalesce below coalesce
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
    +    val correctAnswer = testRelation.coalesce(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
       test("collapse two adjacent repartitions into one") {
    -    val query = testRelation
    +    // Always respects the top repartition amd removes useless repartition below repartition
    +    val query1 = testRelation
    +      .repartition(10)
    +      .repartition(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .repartition(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer = testRelation.repartition(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
    +  test("coalesce above repartition") {
    +    // Remove useless coalesce above repartition
    +    val query1 = testRelation
           .repartition(10)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    // No change in this case
    +    val query2 = testRelation
    +      .repartition(30)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = query2.analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above coalesce") {
    +    // Always respects the top repartition amd removes useless coalesce below repartition
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .repartition(20)
    +    // Remove useless coalesce above repartition
    +    val query2 = testRelation
    +      .coalesce(30)
           .repartition(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
         val correctAnswer = testRelation.repartition(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartition and repartitionBy into one") {
    -    val query = testRelation
    +  test("repartitionBy above repartition") {
    +    val query1 = testRelation
           .repartition(10)
           .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(20).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .repartition(30)
    +      .distribute('a)(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized2, correctAnswer2)
       }
     
    -  test("collapse repartitionBy and repartition into one") {
    -    val query = testRelation
    +  test("repartitionBy above coalesce") {
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .distribute('a)(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .coalesce(20)
    +      .distribute('a)(30)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(30).analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above repartitionBy") {
    +    val query1 = testRelation
           .distribute('a)(20)
           .repartition(10)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(10).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(10).analyze
    --- End diff --
    
    I not quite sure about this. Shall we optimize to `relation.repartition(10)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104355337
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -562,27 +562,43 @@ object CollapseProject extends Rule[LogicalPlan] {
     }
     
     /**
    - * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
    - * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    - * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * Combines adjacent [[RepartitionOperation]] operators
      */
     object CollapseRepartition extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    // Case 1
    -    case Repartition(numPartitions, shuffle, Repartition(_, _, child)) =>
    -      Repartition(numPartitions, shuffle, child)
    -    // Case 2
    -    case RepartitionByExpression(exprs, RepartitionByExpression(_, child, _), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    -    // Case 3
    -    case Repartition(numPartitions, _, r: RepartitionByExpression) =>
    -      r.copy(numPartitions = numPartitions)
    -    // Case 3
    -    case RepartitionByExpression(exprs, Repartition(_, _, child), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    +    // Case 1: When a Repartition has a child of Repartition or RepartitionByExpression,
    +    // we can collapse it with the child based on the type of shuffle and the specified number
    +    // of partitions.
    +    case r @ Repartition(_, _, child: Repartition) =>
    --- End diff --
    
    we can only write one case `case r @ Repartition(_, _, child: RepartitionOperation)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101341551
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -563,23 +563,27 @@ object CollapseProject extends Rule[LogicalPlan] {
     /**
      * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
      * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    + * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]] if their shuffle types
    + *    are the same or the parent's shuffle is true.
      * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * 3. When a shuffle-enabled [[Repartition]] is above a [[RepartitionByExpression]], collapse as a
    + *    single [[RepartitionByExpression]] with the expression and the last number of partition.
    + * 4. When a [[RepartitionByExpression]] is above a [[Repartition]], collapse as a single
    --- End diff --
    
    `RepartitionByExpression ` always uses `ShuffleExchange`. Thus, it is like Repartition with enabled shuffle. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101219386
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,6 +32,18 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    val query = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    --- End diff --
    
    hmm, I can see the argument.
    but there are 2 adjacent coalesces like this shouldn't it take the smaller number? (since coalesce can't increase partition numbers)
    whereas if there are 2 adjacent repartition it could take the last number


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74157/testReport)** for PR 16933 at commit [`d69c5a1`](https://github.com/apache/spark/commit/d69c5a1f8d508f605f15a68f2b5adc6dbda60855).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104556080
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -562,27 +562,37 @@ object CollapseProject extends Rule[LogicalPlan] {
     }
     
     /**
    - * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
    - * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    - * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * Combines adjacent [[RepartitionOperation]] operators
      */
     object CollapseRepartition extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    // Case 1
    -    case Repartition(numPartitions, shuffle, Repartition(_, _, child)) =>
    -      Repartition(numPartitions, shuffle, child)
    -    // Case 2
    -    case RepartitionByExpression(exprs, RepartitionByExpression(_, child, _), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    -    // Case 3
    -    case Repartition(numPartitions, _, r: RepartitionByExpression) =>
    -      r.copy(numPartitions = numPartitions)
    -    // Case 3
    -    case RepartitionByExpression(exprs, Repartition(_, _, child), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    +    // Case 1: When a Repartition has a child of Repartition or RepartitionByExpression,
    +    // we can collapse it with the child based on the type of shuffle and the specified number
    +    // of partitions.
    +    case r @ Repartition(_, _, child: RepartitionOperation) =>
    +      collapseRepartition(r, child)
    +    // Case 2: When a RepartitionByExpression has a child of Repartition or RepartitionByExpression
    +    // we can remove the child.
    +    case r @ RepartitionByExpression(_, child: RepartitionOperation, _) =>
    +      r.copy(child = child.child)
    +  }
    +
    +  /**
    +   * Collapses the [[Repartition]] with its child [[RepartitionOperation]], if possible.
    +   * - Case 1 the top [[Repartition]] does not enable shuffle (i.e., coalesce API):
    +   *   If the last numPartitions is bigger, returns the child node; otherwise, keep unchanged.
    +   * - Case 2 the top [[Repartition]] enables shuffle (i.e., repartition API):
    +   *   returns the child node with the last numPartitions.
    +   */
    +  private def collapseRepartition(r: Repartition, child: RepartitionOperation): LogicalPlan = {
    --- End diff --
    
    can we inline this method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104772284
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,47 +32,180 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    // Always respects the top coalesces amd removes useless coalesce below coalesce
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
    +    val correctAnswer = testRelation.coalesce(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
       test("collapse two adjacent repartitions into one") {
    -    val query = testRelation
    +    // Always respects the top repartition amd removes useless repartition below repartition
    +    val query1 = testRelation
    +      .repartition(10)
    +      .repartition(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .repartition(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer = testRelation.repartition(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
    +  test("coalesce above repartition") {
    +    // Remove useless coalesce above repartition
    +    val query1 = testRelation
           .repartition(10)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    // No change in this case
    +    val query2 = testRelation
    +      .repartition(30)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = query2.analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above coalesce") {
    +    // Always respects the top repartition amd removes useless coalesce below repartition
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .repartition(20)
    +    // Remove useless coalesce above repartition
    +    val query2 = testRelation
    +      .coalesce(30)
           .repartition(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
         val correctAnswer = testRelation.repartition(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartition and repartitionBy into one") {
    -    val query = testRelation
    +  test("repartitionBy above repartition") {
    +    val query1 = testRelation
           .repartition(10)
           .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(20).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .repartition(30)
    +      .distribute('a)(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(20).analyze
    --- End diff --
    
    this is same as `correctAnswer1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104355455
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -562,27 +562,43 @@ object CollapseProject extends Rule[LogicalPlan] {
     }
     
     /**
    - * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
    - * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    - * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * Combines adjacent [[RepartitionOperation]] operators
      */
     object CollapseRepartition extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    // Case 1
    -    case Repartition(numPartitions, shuffle, Repartition(_, _, child)) =>
    -      Repartition(numPartitions, shuffle, child)
    -    // Case 2
    -    case RepartitionByExpression(exprs, RepartitionByExpression(_, child, _), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    -    // Case 3
    -    case Repartition(numPartitions, _, r: RepartitionByExpression) =>
    -      r.copy(numPartitions = numPartitions)
    -    // Case 3
    -    case RepartitionByExpression(exprs, Repartition(_, _, child), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    +    // Case 1: When a Repartition has a child of Repartition or RepartitionByExpression,
    +    // we can collapse it with the child based on the type of shuffle and the specified number
    +    // of partitions.
    +    case r @ Repartition(_, _, child: Repartition) =>
    +      collapseRepartition(r, child)
    +    case r @ Repartition(_, _, child: RepartitionByExpression) =>
    +      collapseRepartition(r, child)
    +    // Case 2: When a RepartitionByExpression has a child of Repartition or RepartitionByExpression
    +    // we can remove the child.
    +    case r @ RepartitionByExpression(_, child: RepartitionByExpression, _) =>
    +      r.copy(child = child.child)
    +    case r @ RepartitionByExpression(_, child: Repartition, _) =>
    +      r.copy(child = child.child)
    +  }
    +
    +  /**
    +   * Collapses the [[Repartition]] with its child [[RepartitionOperation]], if possible.
    +   * - Case 1 the top [[Repartition]] does not enable shuffle (i.e., coalesce API):
    +   *   If the last numPartitions is bigger, returns the child node; otherwise, keep unchanged.
    +   * - Case 2 the top [[Repartition]] enables shuffle (i.e., repartition API):
    +   *   returns the child node with the last numPartitions.
    +   */
    +  private def collapseRepartition(r: Repartition, child: RepartitionOperation): LogicalPlan = {
    +    (r.shuffle, child.shuffle) match {
    +      case (false, true) => child match {
    --- End diff --
    
    why this pattern match? we can just call `child.numPartitions`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101219684
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -43,15 +55,44 @@ class CollapseRepartitionSuite extends PlanTest {
         comparePlans(optimized, correctAnswer)
       }
     
    +  test("collapse one coalesce and one repartition into one") {
    +    val query1 = testRelation
    +      .coalesce(20)
    +      .repartition(5)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(5).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .repartition(5)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.repartition(5).coalesce(20).analyze
    --- End diff --
    
    that might be the plan but the end result should be numPartitions == 5 correct? is there another suite we could add tests for repartition/coalesce like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101218852
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -563,23 +563,27 @@ object CollapseProject extends Rule[LogicalPlan] {
     /**
      * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
      * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    + * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]] if their shuffle types
    + *    are the same or the parent's shuffle is true.
      * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * 3. When a shuffle-enabled [[Repartition]] is above a [[RepartitionByExpression]], collapse as a
    + *    single [[RepartitionByExpression]] with the expression and the last number of partition.
    + * 4. When a [[RepartitionByExpression]] is above a [[Repartition]], collapse as a single
    --- End diff --
    
    does shuffle type matter for Repartition in this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101341238
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
    @@ -374,6 +374,9 @@ package object dsl {
             case plan => SubqueryAlias(alias, plan, None)
           }
     
    +      def coalesce(num: Integer): LogicalPlan =
    --- End diff --
    
    They are used for the test cases in catalyst package, in which Dataset APIs are not available. Thus, that is why we add these DSL for test cases


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #73911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73911/testReport)** for PR 16933 at commit [`680c3af`](https://github.com/apache/spark/commit/680c3afa4f29aeffadd17798b7a06f1664964683).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class RepartitionOperation extends UnaryNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    cc @cloud-fan @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16933


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101342124
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,6 +32,18 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    val query = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    --- End diff --
    
    I think it would be better to respect the later input number, which is specified by users, for avoiding any surprise to users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74045/testReport)** for PR 16933 at commit [`5453ad4`](https://github.com/apache/spark/commit/5453ad45ea7149071f281c2914427da8afb41a1d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104798255
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,47 +32,175 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    // Always respects the top coalesces amd removes useless coalesce below coalesce
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
    +    val correctAnswer = testRelation.coalesce(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
       test("collapse two adjacent repartitions into one") {
    -    val query = testRelation
    +    // Always respects the top repartition amd removes useless repartition below repartition
    +    val query1 = testRelation
           .repartition(10)
           .repartition(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .repartition(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
         val correctAnswer = testRelation.repartition(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartition and repartitionBy into one") {
    -    val query = testRelation
    +  test("coalesce above repartition") {
    +    // Remove useless coalesce above repartition
    +    val query1 = testRelation
    +      .repartition(10)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    // No change in this case
    +    val query2 = testRelation
    +      .repartition(30)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = query2.analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above coalesce") {
    +    // Always respects the top repartition amd removes useless coalesce below repartition
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .repartition(20)
    +    // Remove useless coalesce above repartition
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .repartition(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer = testRelation.repartition(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
    +  test("repartitionBy above repartition") {
    +    val query1 = testRelation
           .repartition(10)
           .distribute('a)(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
         val correctAnswer = testRelation.distribute('a)(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartitionBy and repartition into one") {
    -    val query = testRelation
    +  test("repartitionBy above coalesce") {
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .distribute('a)(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .coalesce(20)
    +      .distribute('a)(30)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(30).analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above repartitionBy") {
    +    val query1 = testRelation
           .distribute('a)(20)
           .repartition(10)
    --- End diff --
    
    we can still pick the same numPartition pairs: `10, 20` and `30, 20`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74157/testReport)** for PR 16933 at commit [`d69c5a1`](https://github.com/apache/spark/commit/d69c5a1f8d508f605f15a68f2b5adc6dbda60855).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Let me re-write the whole test cases. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #72894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72894/testReport)** for PR 16933 at commit [`7b4a9dd`](https://github.com/apache/spark/commit/7b4a9dd61d44aa10733f643381db014cf49f9d5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74064/testReport)** for PR 16933 at commit [`4649af4`](https://github.com/apache/spark/commit/4649af4f660c7ffa877a13dd24843de9e8dd72a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73910/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74140/testReport)** for PR 16933 at commit [`d69c5a1`](https://github.com/apache/spark/commit/d69c5a1f8d508f605f15a68f2b5adc6dbda60855).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #73910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73910/testReport)** for PR 16933 at commit [`0f95a6f`](https://github.com/apache/spark/commit/0f95a6f564b044c7f866ab69edd2ba0a565bb47b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    LGTM, pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #72894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72894/testReport)** for PR 16933 at commit [`7b4a9dd`](https://github.com/apache/spark/commit/7b4a9dd61d44aa10733f643381db014cf49f9d5d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    awesome!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101163037
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -563,23 +563,27 @@ object CollapseProject extends Rule[LogicalPlan] {
     /**
      * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
      * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    + * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]] if their shuffle types
    + *    are the same or the parent's shuffle is true.
      * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * 3. When a shuffle-enabled [[Repartition]] is above a [[RepartitionByExpression]], collapse as a
    + *    single [[RepartitionByExpression]] with the expression and the last number of partition.
    + * 4. When a [[RepartitionByExpression]] is above a [[Repartition]], collapse as a single
    + *    [[RepartitionByExpression]] with the expression and the last number of partition.
      */
     object CollapseRepartition extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
         // Case 1
    -    case Repartition(numPartitions, shuffle, Repartition(_, _, child)) =>
    +    case Repartition(numPartitions, shuffle, Repartition(_, shuffleChild, child))
    +        if shuffle == shuffleChild || shuffle =>
    --- End diff --
    
    Oh, thank you for fixing this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72894/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74064/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104786851
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,47 +32,180 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    // Always respects the top coalesces amd removes useless coalesce below coalesce
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
    +    val correctAnswer = testRelation.coalesce(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
       test("collapse two adjacent repartitions into one") {
    -    val query = testRelation
    +    // Always respects the top repartition amd removes useless repartition below repartition
    +    val query1 = testRelation
    +      .repartition(10)
    +      .repartition(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .repartition(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer = testRelation.repartition(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
    +  test("coalesce above repartition") {
    +    // Remove useless coalesce above repartition
    +    val query1 = testRelation
           .repartition(10)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    // No change in this case
    +    val query2 = testRelation
    +      .repartition(30)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = query2.analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above coalesce") {
    +    // Always respects the top repartition amd removes useless coalesce below repartition
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .repartition(20)
    +    // Remove useless coalesce above repartition
    +    val query2 = testRelation
    +      .coalesce(30)
           .repartition(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
         val correctAnswer = testRelation.repartition(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartition and repartitionBy into one") {
    -    val query = testRelation
    +  test("repartitionBy above repartition") {
    +    val query1 = testRelation
           .repartition(10)
           .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(20).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .repartition(30)
    +      .distribute('a)(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized2, correctAnswer2)
       }
     
    -  test("collapse repartitionBy and repartition into one") {
    -    val query = testRelation
    +  test("repartitionBy above coalesce") {
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .distribute('a)(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .coalesce(20)
    +      .distribute('a)(30)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(30).analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above repartitionBy") {
    +    val query1 = testRelation
           .distribute('a)(20)
           .repartition(10)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(10).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(10).analyze
    --- End diff --
    
    My concern is, optimization should not change the result. `relation.distributeBy('a, 10).repartition(10)` should have same result of `relation.repartition(10)`, instead of `relation.distributeBy('a, 10)`. It's not about which one is cheaper, we should not surprise users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104543920
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -562,27 +562,43 @@ object CollapseProject extends Rule[LogicalPlan] {
     }
     
     /**
    - * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
    - * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    - * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * Combines adjacent [[RepartitionOperation]] operators
      */
     object CollapseRepartition extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    // Case 1
    -    case Repartition(numPartitions, shuffle, Repartition(_, _, child)) =>
    -      Repartition(numPartitions, shuffle, child)
    -    // Case 2
    -    case RepartitionByExpression(exprs, RepartitionByExpression(_, child, _), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    -    // Case 3
    -    case Repartition(numPartitions, _, r: RepartitionByExpression) =>
    -      r.copy(numPartitions = numPartitions)
    -    // Case 3
    -    case RepartitionByExpression(exprs, Repartition(_, _, child), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    +    // Case 1: When a Repartition has a child of Repartition or RepartitionByExpression,
    +    // we can collapse it with the child based on the type of shuffle and the specified number
    +    // of partitions.
    +    case r @ Repartition(_, _, child: Repartition) =>
    --- End diff --
    
    Great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74179 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74179/testReport)** for PR 16933 at commit [`d69c5a1`](https://github.com/apache/spark/commit/d69c5a1f8d508f605f15a68f2b5adc6dbda60855).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101218700
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
    @@ -374,6 +374,9 @@ package object dsl {
             case plan => SubqueryAlias(alias, plan, None)
           }
     
    +      def coalesce(num: Integer): LogicalPlan =
    --- End diff --
    
    does it conflict with sql coalesce by having it here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74179/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104786298
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,47 +32,180 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    // Always respects the top coalesces amd removes useless coalesce below coalesce
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
    +    val correctAnswer = testRelation.coalesce(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
       test("collapse two adjacent repartitions into one") {
    -    val query = testRelation
    +    // Always respects the top repartition amd removes useless repartition below repartition
    +    val query1 = testRelation
    +      .repartition(10)
    +      .repartition(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .repartition(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer = testRelation.repartition(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
    +  test("coalesce above repartition") {
    +    // Remove useless coalesce above repartition
    +    val query1 = testRelation
           .repartition(10)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    // No change in this case
    +    val query2 = testRelation
    +      .repartition(30)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = query2.analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above coalesce") {
    +    // Always respects the top repartition amd removes useless coalesce below repartition
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .repartition(20)
    +    // Remove useless coalesce above repartition
    +    val query2 = testRelation
    +      .coalesce(30)
           .repartition(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
         val correctAnswer = testRelation.repartition(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartition and repartitionBy into one") {
    -    val query = testRelation
    +  test("repartitionBy above repartition") {
    +    val query1 = testRelation
           .repartition(10)
           .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(20).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .repartition(30)
    +      .distribute('a)(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized2, correctAnswer2)
       }
     
    -  test("collapse repartitionBy and repartition into one") {
    -    val query = testRelation
    +  test("repartitionBy above coalesce") {
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .distribute('a)(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .coalesce(20)
    +      .distribute('a)(30)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(30).analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above repartitionBy") {
    +    val query1 = testRelation
           .distribute('a)(20)
           .repartition(10)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(10).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(10).analyze
    --- End diff --
    
    Here, I just followed what we did before. After more code reading, I think we can do it, since `RoundRobinPartitioning ` looks cheaper. 
    
    ```Scala
          case logical.Repartition(numPartitions, shuffle, child) =>
            if (shuffle) {
              ShuffleExchange(RoundRobinPartitioning(numPartitions), planLater(child)) :: Nil
            } else {
              execution.CoalesceExec(numPartitions, planLater(child)) :: Nil
            }
          case logical.RepartitionByExpression(expressions, child, numPartitions) =>
            exchange.ShuffleExchange(HashPartitioning(
              expressions, numPartitions), planLater(child)) :: Nil
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74157/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104798081
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,47 +32,175 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    // Always respects the top coalesces amd removes useless coalesce below coalesce
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
    +    val correctAnswer = testRelation.coalesce(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
       test("collapse two adjacent repartitions into one") {
    -    val query = testRelation
    +    // Always respects the top repartition amd removes useless repartition below repartition
    +    val query1 = testRelation
           .repartition(10)
           .repartition(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .repartition(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
         val correctAnswer = testRelation.repartition(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartition and repartitionBy into one") {
    -    val query = testRelation
    +  test("coalesce above repartition") {
    +    // Remove useless coalesce above repartition
    +    val query1 = testRelation
    +      .repartition(10)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    // No change in this case
    +    val query2 = testRelation
    +      .repartition(30)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = query2.analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above coalesce") {
    +    // Always respects the top repartition amd removes useless coalesce below repartition
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .repartition(20)
    +    // Remove useless coalesce above repartition
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .repartition(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer = testRelation.repartition(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
    +  test("repartitionBy above repartition") {
    +    val query1 = testRelation
           .repartition(10)
           .distribute('a)(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
         val correctAnswer = testRelation.distribute('a)(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartitionBy and repartition into one") {
    -    val query = testRelation
    +  test("repartitionBy above coalesce") {
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .distribute('a)(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .coalesce(20)
    +      .distribute('a)(30)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(30).analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above repartitionBy") {
    +    val query1 = testRelation
    --- End diff --
    
    we can add a comment: `// Always respects the top repartition amd removes useless distribute below repartition`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101350801
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -563,23 +563,27 @@ object CollapseProject extends Rule[LogicalPlan] {
     /**
      * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
      * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    + * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]] if their shuffle types
    + *    are the same or the parent's shuffle is true.
      * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * 3. When a shuffle-enabled [[Repartition]] is above a [[RepartitionByExpression]], collapse as a
    + *    single [[RepartitionByExpression]] with the expression and the last number of partition.
    + * 4. When a [[RepartitionByExpression]] is above a [[Repartition]], collapse as a single
    --- End diff --
    
    right, I was referring to shuffle on `Repartition`, but I see your point of `RepartitionByExpression` overriding it regardless


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101351110
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,6 +32,18 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    val query = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    --- End diff --
    
    ok, agreed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74137/testReport)** for PR 16933 at commit [`8306c49`](https://github.com/apache/spark/commit/8306c49b511b0f5b8c5546d9638bdf01289b5d39).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74064/testReport)** for PR 16933 at commit [`4649af4`](https://github.com/apache/spark/commit/4649af4f660c7ffa877a13dd24843de9e8dd72a6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74045/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #73911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73911/testReport)** for PR 16933 at commit [`680c3af`](https://github.com/apache/spark/commit/680c3afa4f29aeffadd17798b7a06f1664964683).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74137/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104797868
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,47 +32,175 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    // Always respects the top coalesces amd removes useless coalesce below coalesce
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
    +    val correctAnswer = testRelation.coalesce(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
       test("collapse two adjacent repartitions into one") {
    -    val query = testRelation
    +    // Always respects the top repartition amd removes useless repartition below repartition
    +    val query1 = testRelation
           .repartition(10)
           .repartition(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .repartition(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
         val correctAnswer = testRelation.repartition(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartition and repartitionBy into one") {
    -    val query = testRelation
    +  test("coalesce above repartition") {
    +    // Remove useless coalesce above repartition
    +    val query1 = testRelation
    +      .repartition(10)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    // No change in this case
    +    val query2 = testRelation
    +      .repartition(30)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = query2.analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above coalesce") {
    +    // Always respects the top repartition amd removes useless coalesce below repartition
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .repartition(20)
    +    // Remove useless coalesce above repartition
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .repartition(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer = testRelation.repartition(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
    +  test("repartitionBy above repartition") {
    +    val query1 = testRelation
           .repartition(10)
           .distribute('a)(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
         val correctAnswer = testRelation.distribute('a)(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartitionBy and repartition into one") {
    -    val query = testRelation
    +  test("repartitionBy above coalesce") {
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .distribute('a)(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .coalesce(20)
    +      .distribute('a)(30)
    --- End diff --
    
    I'd like to make `query2` as
    ```
    testRelation
      .coalesce(30)
      .distribute('a)(20)
    ```
    i.e. the numPartitions of `coalesce` is bigger than `distribute`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74045/testReport)** for PR 16933 at commit [`5453ad4`](https://github.com/apache/spark/commit/5453ad45ea7149071f281c2914427da8afb41a1d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73911/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74140/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #73910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73910/testReport)** for PR 16933 at commit [`0f95a6f`](https://github.com/apache/spark/commit/0f95a6f564b044c7f866ab69edd2ba0a565bb47b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class RepartitionOperation(numPartitions: Int) extends UnaryNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104543895
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -562,27 +562,43 @@ object CollapseProject extends Rule[LogicalPlan] {
     }
     
     /**
    - * Combines adjacent [[Repartition]] and [[RepartitionByExpression]] operator combinations
    - * by keeping only the one.
    - * 1. For adjacent [[Repartition]]s, collapse into the last [[Repartition]].
    - * 2. For adjacent [[RepartitionByExpression]]s, collapse into the last [[RepartitionByExpression]].
    - * 3. For a combination of [[Repartition]] and [[RepartitionByExpression]], collapse as a single
    - *    [[RepartitionByExpression]] with the expression and last number of partition.
    + * Combines adjacent [[RepartitionOperation]] operators
      */
     object CollapseRepartition extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    // Case 1
    -    case Repartition(numPartitions, shuffle, Repartition(_, _, child)) =>
    -      Repartition(numPartitions, shuffle, child)
    -    // Case 2
    -    case RepartitionByExpression(exprs, RepartitionByExpression(_, child, _), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    -    // Case 3
    -    case Repartition(numPartitions, _, r: RepartitionByExpression) =>
    -      r.copy(numPartitions = numPartitions)
    -    // Case 3
    -    case RepartitionByExpression(exprs, Repartition(_, _, child), numPartitions) =>
    -      RepartitionByExpression(exprs, child, numPartitions)
    +    // Case 1: When a Repartition has a child of Repartition or RepartitionByExpression,
    +    // we can collapse it with the child based on the type of shuffle and the specified number
    +    // of partitions.
    +    case r @ Repartition(_, _, child: Repartition) =>
    +      collapseRepartition(r, child)
    +    case r @ Repartition(_, _, child: RepartitionByExpression) =>
    +      collapseRepartition(r, child)
    +    // Case 2: When a RepartitionByExpression has a child of Repartition or RepartitionByExpression
    +    // we can remove the child.
    +    case r @ RepartitionByExpression(_, child: RepartitionByExpression, _) =>
    +      r.copy(child = child.child)
    +    case r @ RepartitionByExpression(_, child: Repartition, _) =>
    +      r.copy(child = child.child)
    +  }
    +
    +  /**
    +   * Collapses the [[Repartition]] with its child [[RepartitionOperation]], if possible.
    +   * - Case 1 the top [[Repartition]] does not enable shuffle (i.e., coalesce API):
    +   *   If the last numPartitions is bigger, returns the child node; otherwise, keep unchanged.
    +   * - Case 2 the top [[Repartition]] enables shuffle (i.e., repartition API):
    +   *   returns the child node with the last numPartitions.
    +   */
    +  private def collapseRepartition(r: Repartition, child: RepartitionOperation): LogicalPlan = {
    +    (r.shuffle, child.shuffle) match {
    +      case (false, true) => child match {
    --- End diff --
    
    Yes!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74140/testReport)** for PR 16933 at commit [`d69c5a1`](https://github.com/apache/spark/commit/d69c5a1f8d508f605f15a68f2b5adc6dbda60855).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r104798370
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -32,47 +32,175 @@ class CollapseRepartitionSuite extends PlanTest {
     
       val testRelation = LocalRelation('a.int, 'b.int)
     
    +
    +  test("collapse two adjacent coalesces into one") {
    +    // Always respects the top coalesces amd removes useless coalesce below coalesce
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .coalesce(20)
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +
    +    val correctAnswer = testRelation.coalesce(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
       test("collapse two adjacent repartitions into one") {
    -    val query = testRelation
    +    // Always respects the top repartition amd removes useless repartition below repartition
    +    val query1 = testRelation
           .repartition(10)
           .repartition(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .repartition(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
         val correctAnswer = testRelation.repartition(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartition and repartitionBy into one") {
    -    val query = testRelation
    +  test("coalesce above repartition") {
    +    // Remove useless coalesce above repartition
    +    val query1 = testRelation
    +      .repartition(10)
    +      .coalesce(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    // No change in this case
    +    val query2 = testRelation
    +      .repartition(30)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = query2.analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above coalesce") {
    +    // Always respects the top repartition amd removes useless coalesce below repartition
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .repartition(20)
    +    // Remove useless coalesce above repartition
    +    val query2 = testRelation
    +      .coalesce(30)
    +      .repartition(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer = testRelation.repartition(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
    +  }
    +
    +  test("repartitionBy above repartition") {
    +    val query1 = testRelation
           .repartition(10)
           .distribute('a)(20)
    +    val query2 = testRelation
    +      .repartition(30)
    +      .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val optimized2 = Optimize.execute(query2.analyze)
         val correctAnswer = testRelation.distribute('a)(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized1, correctAnswer)
    +    comparePlans(optimized2, correctAnswer)
       }
     
    -  test("collapse repartitionBy and repartition into one") {
    -    val query = testRelation
    +  test("repartitionBy above coalesce") {
    +    val query1 = testRelation
    +      .coalesce(10)
    +      .distribute('a)(20)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .coalesce(20)
    +      .distribute('a)(30)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(30).analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("repartition above repartitionBy") {
    +    val query1 = testRelation
           .distribute('a)(20)
           .repartition(10)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(10).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .distribute('a)(20)
    +      .repartition(30)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.repartition(30).analyze
    +
    +    comparePlans(optimized2, correctAnswer2)
    +  }
    +
    +  test("coalesce above repartitionBy") {
    +    val query1 = testRelation
    +      .distribute('a)(20)
    +      .coalesce(10)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).coalesce(10).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .distribute('a)(20)
    +      .coalesce(30)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(20).analyze
     
    -    comparePlans(optimized, correctAnswer)
    +    comparePlans(optimized2, correctAnswer2)
       }
     
       test("collapse two adjacent repartitionBys into one") {
    -    val query = testRelation
    +    val query1 = testRelation
           .distribute('b)(10)
           .distribute('a)(20)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = testRelation.distribute('a)(20).analyze
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.distribute('a)(20).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .distribute('b)(30)
    +      .distribute('a)(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.distribute('a)(20).analyze
    --- End diff --
    
    it's same with `correctAnswer1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101346606
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -43,15 +55,44 @@ class CollapseRepartitionSuite extends PlanTest {
         comparePlans(optimized, correctAnswer)
       }
     
    +  test("collapse one coalesce and one repartition into one") {
    +    val query1 = testRelation
    +      .coalesce(20)
    +      .repartition(5)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(5).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .repartition(5)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.repartition(5).coalesce(20).analyze
    --- End diff --
    
    Yeah. We can get rid of `coalesce` if the number of partitions is smaller than the child `repartition`
    
    Actually, I can add some simple end-to-end test cases like what you did in the R side. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74179/testReport)** for PR 16933 at commit [`d69c5a1`](https://github.com/apache/spark/commit/d69c5a1f8d508f605f15a68f2b5adc6dbda60855).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16933#discussion_r101911156
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala ---
    @@ -43,15 +55,44 @@ class CollapseRepartitionSuite extends PlanTest {
         comparePlans(optimized, correctAnswer)
       }
     
    +  test("collapse one coalesce and one repartition into one") {
    +    val query1 = testRelation
    +      .coalesce(20)
    +      .repartition(5)
    +
    +    val optimized1 = Optimize.execute(query1.analyze)
    +    val correctAnswer1 = testRelation.repartition(5).analyze
    +
    +    comparePlans(optimized1, correctAnswer1)
    +
    +    val query2 = testRelation
    +      .repartition(5)
    +      .coalesce(20)
    +
    +    val optimized2 = Optimize.execute(query2.analyze)
    +    val correctAnswer2 = testRelation.repartition(5).coalesce(20).analyze
    --- End diff --
    
    For improving this rule, we need to clean up the resolution of RepartitionByExpression at first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16933
  
    **[Test build #74137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74137/testReport)** for PR 16933 at commit [`8306c49`](https://github.com/apache/spark/commit/8306c49b511b0f5b8c5546d9638bdf01289b5d39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org