You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by jluan <ja...@gmail.com> on 2016/01/25 09:46:48 UTC

RangePartitioning skewed data

Lets say I have a dataset of (K,V) where the keys are really skewed:

myDataRDD = 
[(8, 1), (8, 13), (1,1), (2,4)]
[(8, 12), (8, 15), (8, 7), (8, 6), (8, 4), (8, 3), (8, 4), (10,2)]

If I applied a RangePartitioner to this set of data, say val rangePart = new
RangePartitioner(4, myDataRDD) and then repartitioned the data, would I be
able to get back 4 equally distributed partitions where Key=8 would be split
across multiple partitions, or would all the 8 keys end up in one partition?

Also, does myDataRDD need to be sorted in order to correctly create the
range partitioner? My research shows this may be the case.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RangePartitioning-skewed-data-tp26055.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org