You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Fernando Velasco <fe...@gmail.com> on 2015/10/19 17:38:04 UTC

k-prototypes in MLLib?

Hi everyone!

I am a data scientist new to Spark and I am interested on clustering of
mixed variables. I am more used to R, where there are implementations like
Daysy, PAM, etc. It is true that dummy variables along with K-Means can
perform a nice job on clustering mixed variables, but I find this is not a
completely correct treatment for the categorical ones. So, my question is
if there is any K-modes/k-prototypes implementation planned to be included
in MLlib in the future.

I have been able to find this
https://issues.apache.org/jira/browse/SPARK-4510 but it seems PAM is not
completely scalable. Perhaps K-prototypes could fit better.

Regards,