You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "wenweijian (Jira)" <ji...@apache.org> on 2023/04/26 17:01:00 UTC
[jira] [Created] (SPARK-43297) Make improvement to LocalKMeans
wenweijian created SPARK-43297:
----------------------------------
Summary: Make improvement to LocalKMeans
Key: SPARK-43297
URL: https://issues.apache.org/jira/browse/SPARK-43297
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 3.3.0
Reporter: wenweijian
There are two initializationMode in Kmeans, random mode and parallel mode.
The ParallelMode is using kmeansPlusPlus to generate the centers point, but the kMeansPlusPlus is a local method which runs in the driver.
If the scale of points is huge, the kMeansPlusPlus will run for a long time, because it is a single thread method running in the driiver.
We can make this method run in parallel to make it faster, such as using Parallel collections.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org