You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhengruifeng (Jira)" <ji...@apache.org> on 2020/07/22 07:17:00 UTC
[jira] [Created] (SPARK-32384) repartitionAndSortWithinPartitions
avoid shuffle with same partitioner
zhengruifeng created SPARK-32384:
------------------------------------
Summary: repartitionAndSortWithinPartitions avoid shuffle with same partitioner
Key: SPARK-32384
URL: https://issues.apache.org/jira/browse/SPARK-32384
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.1.0
Reporter: zhengruifeng
In {{combineByKeyWithClassTag}}, there is a check so that if the partitioner is the same as the one of the RDD:
{code:java}
if (self.partitioner == Some(partitioner)) {
self.mapPartitions(iter => {
val context = TaskContext.get()
new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, context))
}, preservesPartitioning = true)
} else {
new ShuffledRDD[K, V, C](self, partitioner)
.setSerializer(serializer)
.setAggregator(aggregator)
.setMapSideCombine(mapSideCombine)
}
{code}
In {{repartitionAndSortWithinPartitions}}, this shuffle can also be skipped in this case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org