You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/01/19 03:34:24 UTC
[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20091#discussion_r162531289
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -67,31 +69,32 @@ object Partitioner {
None
}
- if (isEligiblePartitioner(hasMaxPartitioner, rdds)) {
+ val defaultNumPartitions = if (rdd.context.conf.contains("spark.default.parallelism")) {
+ rdd.context.defaultParallelism
+ } else {
+ rdds.map(_.partitions.length).max
+ }
+
+ // If the existing max partitioner is an eligible one, or its partitions number is larger
+ // than the default number of partitions, use the existing partitioner.
+ if (hasMaxPartitioner.nonEmpty && (isEligiblePartitioner(hasMaxPartitioner.get, rdds) ||
+ defaultNumPartitions < hasMaxPartitioner.get.getNumPartitions)) {
--- End diff --
This is the core change. I think it makes sense as it fixes a regression in https://github.com/apache/spark/pull/20002
If the partitioner is not eligible, but its numPartition is larger the the default one, we should still pick this partitioner instead of creating a new one.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org