You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/09/10 04:19:45 UTC

[jira] [Commented] (SPARK-10014) ML model broadcasts should be stored in private vars

    [ https://issues.apache.org/jira/browse/SPARK-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738017#comment-14738017 ] 

Joseph K. Bradley commented on SPARK-10014:
-------------------------------------------

Per discussion on [https://github.com/apache/spark/pull/8241], we should decide how model mutability should interact with broadcasting.

Problem scenario:
{code}
val model = ...
val predictions1: RDD[...] = model.predict(...)
model.weights(1) = 0.1
val predictions2: RDD[...] = model.predict(...)
{code}

Q: Should this be allowed?  The user would expect predictions1 and 2 to be different.

If this is allowed, then we should re-broadcast the model every time transform/predict is called.

I had originally voted for broadcasting only once, but rethinking, I'd vote for re-broadcasting every time.

> ML model broadcasts should be stored in private vars
> ----------------------------------------------------
>
>                 Key: SPARK-10014
>                 URL: https://issues.apache.org/jira/browse/SPARK-10014
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> Multiple places in MLlib, we broadcast a model before prediction.  Since prediction may be called many times, we should store the broadcast variable in a private var so that we broadcast at most once.
> I'll link subtasks for each problem case I find.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org