You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/03 23:48:27 UTC

[GitHub] [spark] viirya commented on a change in pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

viirya commented on a change in pull request #34785:
URL: https://github.com/apache/spark/pull/34785#discussion_r762347788



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -819,6 +819,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
       case r: logical.RepartitionByExpression =>
         val shuffleOrigin = if (r.partitionExpressions.isEmpty && r.optNumPartitions.isEmpty) {
           REBALANCE_PARTITIONS_BY_NONE
+        } else if (!r.userSpecified) {
+          REBALANCE_PARTITIONS_BY_COL

Review comment:
       `REBALANCE_PARTITIONS_BY_COL` seems more suitable to be from `RebalancePartitions`? Although there is `REBALANCE_PARTITIONS_BY_NONE` here too.
   
   After rebalancing, do we still guarantee partitioning requirement by the partition expressions? Seems not to me as `RebalancePartitions` said:
   
   > ... It also try its best to partition the child output by `partitionExpressions`....
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org