You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by aminn_524 <am...@yahoo.com> on 2014/11/10 08:54:08 UTC

canopy clustering

I want to run k-means of MLib  on a big dataset, it seems for big datsets, we
need to perform pre-clustering methods such as canopy clustering. By
starting with an initial clustering the number of more expensive distance
measurements can be significantly reduced by ignoring points outside of the
initial canopies. 

I I am not mistaken, in the k-means of MLib, there are three initialization
steps : Kmeans ++, Kmeans|| and random .

So, can anyone explain to me that can we use kmeans|| instead of canopy
clustering? or these two methods act completely different? 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/canopy-clustering-tp18462.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org