You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "wenweijian (Jira)" <ji...@apache.org> on 2023/04/26 17:01:00 UTC

[jira] [Created] (SPARK-43297) Make improvement to LocalKMeans

wenweijian created SPARK-43297:
----------------------------------

             Summary: Make improvement to LocalKMeans
                 Key: SPARK-43297
                 URL: https://issues.apache.org/jira/browse/SPARK-43297
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 3.3.0
            Reporter: wenweijian


There are two initializationMode in Kmeans, random mode and parallel mode.

The ParallelMode is using kmeansPlusPlus to generate the centers point, but the kMeansPlusPlus is a local method which runs in the driver.

If the scale of points is huge, the kMeansPlusPlus will run for a long time, because it is a single thread method running in the driiver.

We can make this method run in parallel to make it faster, such as using Parallel collections. 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org