You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Punit Naik <na...@gmail.com> on 2016/09/15 13:57:12 UTC

Partition RDD based on K-Means Clusters

Hi Guys

I have run k-means algorithm on my data and its has classified it into the
number of clusters that I have defined.
Suppose my data is sc.parallelize(Array(1,2,3,4)) and if [1,2] belong to
cluster '1' and [3,4] belong to cluster '2', how can I define a custom
partitioner so that the number of partitions created are equal to the
number of clusters (2 in this case) and each partition has all the elements
belonging to a certain cluster in it.

-- 
Thank You

Regards

Punit Naik