You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kevin Jung <it...@samsung.com> on 2014/12/11 02:42:54 UTC

Partitioner in sortBy

Hi,
I'm wondering if I change RangePartitioner in sortBy to another partitioner
like HashPartitioner.
The first thing that comes into my head is that it can not be replaceable
due to RangePartitioner is a part of the sort algorithm.
If we call mapPartitions on key based partition after sorting, we need to
repartition or coalece the dataset because it is rangepartitioned.
In this case, we can not avoid shuffle dataset twice during sorting and
repartitioning.
It makes performance issues in large dataset.

Thanks,
Kevin



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Partitioner-in-sortBy-tp20614.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org