You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Simon.J (JIRA)" <ji...@apache.org> on 2017/05/08 09:37:04 UTC

[jira] [Updated] (SPARK-20634) result of MLlib KMeans cluster is not stabilize

     [ https://issues.apache.org/jira/browse/SPARK-20634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon.J updated SPARK-20634:
----------------------------

hi：
     it is really stochastic, but the same dataset, use the KMeans in sparkml lib ,the result is stabilize. Is that okay?

2017-05-08 

ffdd-120 



发件人："Sean Owen (JIRA)" <ji...@apache.org>
发送时间：2017-05-08 16:59
主题：[jira] [Resolved] (SPARK-20634) result of MLlib KMeans cluster is not stabilize
收件人："ffdd-120"<ff...@163.com>
抄送：


     [ https://issues.apache.org/jira/browse/SPARK-20634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 

Sean Owen resolved SPARK-20634. 
------------------------------- 
    Resolution: Invalid 

I can't understand what this is describing; please read http://spark.apache.org/contributing.html . This doesn't specify any particular problem. You would not expect k-means results to be the same each time. It's stochastic. 




-- 
This message was sent by Atlassian JIRA 
(v6.3.15#6346) 


> result of MLlib KMeans cluster is not stabilize
> -----------------------------------------------
>
>                 Key: SPARK-20634
>                 URL: https://issues.apache.org/jira/browse/SPARK-20634
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 2.0.2
>         Environment: Windows 10
> spark 2.0.2 standalone
> spyder 3.1.4
> Anaconda 4.3.0
> python 3.5.2
>            Reporter: Simon.J
>            Priority: Critical
>
> 1.Get a DataFrame through python with Cx_Oracle lib.
> 2.Start a local Spark Session.
> 3.Convert the dataset for Kmeansmodel train.
> 4.Train the KMeans model and predict the same data.just set K =3
> 5.Get the ClassifierFeature of the KMeans model'predict.
> 6.Get the count of every ClassifierFeature.
> 7.Loop 4-6 for 20 times.
> 8.Compare the result of every time.
> 9.Find the KMeans result dose not stabilize.
> 10.The same dataset and param for ML package'KMeans, its result is the same.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org