You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "DjvuLee (JIRA)" <ji...@apache.org> on 2014/06/13 14:42:02 UTC

[jira] [Updated] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

     [ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

DjvuLee updated SPARK-2138:
---------------------------

    Description: 
When the algorithm running at certain stage, when running the reduceBykey() algorithm, It can lead to Executor Lost and Task lost, after several times. the application exit.

When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase.

the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622

the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5

  was:
When the algorithm running at certain stage, when running the reduceBykey() algorithm, It can lead to Executor Lost and Task lost, after several times. the application exit.

When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase.



> The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2138
>                 URL: https://issues.apache.org/jira/browse/SPARK-2138
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: DjvuLee
>
> When the algorithm running at certain stage, when running the reduceBykey() algorithm, It can lead to Executor Lost and Task lost, after several times. the application exit.
> When this error occurred, the size of serialized task is bigger than 10MB, and the size become larger as the iteration increase.
> the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
> the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)