You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by GabeChurch <ga...@gmail.com> on 2018/03/14 16:07:57 UTC

Bisecting Kmeans Linkage Matrix Output (Cluster Indices)

I have been working on a project to return a Linkage Matrix output from the
Spark Bisecting Kmeans Algorithm output so that it is possible to plot the
selection steps in a dendogram. I am having trouble returning valid Indices
when I use more than 3-4 clusters in the algorithm and am hoping someone
else might have the time/interest enough to take a look. 

To achieve this I made some modifications to the Bisecting Kmeans algorithm
to produce a z-linkage matrix based on yu-iskw's work. I also made some
modifications to provide more information about the selection steps in the
Bisecting Kmeans Algorithm to the log at run-time.

Test outputs using the Iris Dataset with both k = 3 and k = 10 clusters can
be seen on  my stack overflow post
<https://stackoverflow.com/questions/49265521/bisecting-kmeans-cluster-indices-in-apache-spark>  

The project so far (with a simple sbt build and the compiled jars) can also
be seen on  my github repo
<https://github.com/GabeChurch/IncubatingProjects>  and is also detailed in
the aforementioned stack overflow post.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org