You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/09/03 21:15:52 UTC
[jira] [Commented] (SPARK-3384) Potential thread unsafe Breeze
vector addition in KMeans
[ https://issues.apache.org/jira/browse/SPARK-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120285#comment-14120285 ]
Xiangrui Meng commented on SPARK-3384:
--------------------------------------
[~rnowling] Could you provide a code example that can re-produce the bug you observed in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to a resultHandler is synchronized: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52 .
> Potential thread unsafe Breeze vector addition in KMeans
> --------------------------------------------------------
>
> Key: SPARK-3384
> URL: https://issues.apache.org/jira/browse/SPARK-3384
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Reporter: RJ Nowling
>
> In the KMeans clustering implementation, the Breeze vectors are accumulated using +=. For example,
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L162
> This is potentially a thread unsafe operation. (This is what I observed in local testing.) I suggest changing the += to + -- a new object will be allocated but it will be thread safe since it won't write to an old location accessed by multiple threads.
> Further testing is required to reproduce and verify.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org