You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:13:47 UTC

[jira] [Resolved] (SPARK-18731) Task size in K-means is so large

     [ https://issues.apache.org/jira/browse/SPARK-18731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-18731.
----------------------------------
    Resolution: Incomplete

> Task size in K-means is so large
> --------------------------------
>
>                 Key: SPARK-18731
>                 URL: https://issues.apache.org/jira/browse/SPARK-18731
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.6.1
>            Reporter: Xiaoye Sun
>            Priority: Minor
>              Labels: bulk-closed
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> When run the KMeans algorithm for a large model (e.g. 100k features and 100 centers), there will be warning shown for many of the stages saying that the task size is very large. Here is an example warning. 
> WARN TaskSetManager: Stage 23 contains a task of very large size (56256 KB). The maximum recommended task size is 100 KB.
> This could happen at (sum at KMeansModel.scala:88), (takeSample at KMeans.scala:378), (aggregate at KMeans.scala:404) and (collect at KMeans.scala:436). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org