You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2017/12/03 22:40:53 UTC

[GitHub] spark pull request #19870: [SPARK-22665][SQL] Avoid repartitioning with empt...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/19870

    [SPARK-22665][SQL] Avoid repartitioning with empty list of expressions

    ## What changes were proposed in this pull request?
    
    Repartitioning by empty set of expressions is currently possible, even though it is a case which is not handled properly. Indeed, in `HashExpression` there is a check to avoid to run it on an empty set, but this check is not performed while repartitioning.
    Thus, the PR adds a check to avoid this wrong situation.
    
    ## How was this patch tested?
    
    added UT

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-22665

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19870
    
----
commit fdf10b36cbc746e57cb0b8c2f79646db9dd06964
Author: Marco Gaido <ma...@gmail.com>
Date:   2017-12-03T19:40:01Z

    [SPARK-22665][SQL] Avoid repartitioning with empty list of expressions

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19870: [SPARK-22665][SQL] Avoid repartitioning with empt...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19870#discussion_r154542257
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -838,6 +838,8 @@ case class RepartitionByExpression(
         numPartitions: Int) extends RepartitionOperation {
     
       require(numPartitions > 0, s"Number of partitions ($numPartitions) must be positive.")
    +  require(partitionExpressions.nonEmpty, s"${getClass.getSimpleName} requires a non empty set of " +
    +    s"partitioning expressions.")
    --- End diff --
    
    Let's remove this leading 's'.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    **[Test build #84406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84406/testReport)** for PR 19870 at commit [`fdf10b3`](https://github.com/apache/spark/commit/fdf10b36cbc746e57cb0b8c2f79646db9dd06964).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    **[Test build #84428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84428/testReport)** for PR 19870 at commit [`b4f985f`](https://github.com/apache/spark/commit/b4f985fc81f4664de538e7bcae0e134bce8ae735).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    LGTM pending Jenkins


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    thanks @gatorsmile, I agree and I updated the PR accordingly. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    **[Test build #84441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84441/testReport)** for PR 19870 at commit [`41ad7b9`](https://github.com/apache/spark/commit/41ad7b9cd1976d84caded2cc4fc7f50d0585de03).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    **[Test build #84441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84441/testReport)** for PR 19870 at commit [`41ad7b9`](https://github.com/apache/spark/commit/41ad7b9cd1976d84caded2cc4fc7f50d0585de03).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19870: [SPARK-22665][SQL] Avoid repartitioning with empt...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19870#discussion_r154705369
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -838,6 +838,8 @@ case class RepartitionByExpression(
         numPartitions: Int) extends RepartitionOperation {
     
       require(numPartitions > 0, s"Number of partitions ($numPartitions) must be positive.")
    +  require(partitionExpressions.nonEmpty, s"${getClass.getSimpleName} requires a non empty set of " +
    --- End diff --
    
    what was the behavior if `partitionExpressions` is empty?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84406/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    **[Test build #84428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84428/testReport)** for PR 19870 at commit [`b4f985f`](https://github.com/apache/spark/commit/b4f985fc81f4664de538e7bcae0e134bce8ae735).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84428/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84441/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    To ensure no external behavior change, we should use RoundRobinPartitioning when the expression list is empty, instead of issuing an exception. It will be also consistent with the existing behavior of `df.repartition(numPartitions)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19870: [SPARK-22665][SQL] Avoid repartitioning with empt...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19870


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19870
  
    **[Test build #84406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84406/testReport)** for PR 19870 at commit [`fdf10b3`](https://github.com/apache/spark/commit/fdf10b36cbc746e57cb0b8c2f79646db9dd06964).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19870: [SPARK-22665][SQL] Avoid repartitioning with empt...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19870#discussion_r154764850
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -838,6 +838,8 @@ case class RepartitionByExpression(
         numPartitions: Int) extends RepartitionOperation {
     
       require(numPartitions > 0, s"Number of partitions ($numPartitions) must be positive.")
    +  require(partitionExpressions.nonEmpty, s"${getClass.getSimpleName} requires a non empty set of " +
    --- End diff --
    
    it was putting everything in the same partition


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org