You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "zhengruifeng (Jira)" <ji...@apache.org> on 2020/07/22 07:17:00 UTC

[jira] [Created] (SPARK-32384) repartitionAndSortWithinPartitions avoid shuffle with same partitioner

zhengruifeng created SPARK-32384:
------------------------------------

             Summary: repartitionAndSortWithinPartitions avoid shuffle with same partitioner
                 Key: SPARK-32384
                 URL: https://issues.apache.org/jira/browse/SPARK-32384
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.1.0
            Reporter: zhengruifeng


In {{combineByKeyWithClassTag}}, there is a check so that if the partitioner is the same as the one of the RDD:
{code:java}
if (self.partitioner == Some(partitioner)) {
  self.mapPartitions(iter => {
    val context = TaskContext.get()
    new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, context))
  }, preservesPartitioning = true)
} else {
  new ShuffledRDD[K, V, C](self, partitioner)
    .setSerializer(serializer)
    .setAggregator(aggregator)
    .setMapSideCombine(mapSideCombine)
}
 {code}
 

In {{repartitionAndSortWithinPartitions}}, this shuffle can also be skipped in this case.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org