You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/10/01 20:44:26 UTC
[jira] [Commented] (SPARK-9695) Add random seed Param to ML
Pipeline
[ https://issues.apache.org/jira/browse/SPARK-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940207#comment-14940207 ]
Joseph K. Bradley commented on SPARK-9695:
------------------------------------------
That's what I would propose to. There are a few complications to figure out though.
*API*
* If a Pipeline stage has a seed explicitly set, should the Pipeline overwrite that seed? I'd vote for no.
*What behavior do we want in the situation below?*
Situation:
* User creates a Pipeline with some stages
* User sets pipeline.seed
* User saves pipeline to FILE
* User runs pipeline and produces model A
* User loads Pipeline from FILE and runs it to produce model B
I'd say that the ideal behavior will be for model A and B to produce exactly the same results. However, this will require us to guarantee that each Pipeline stage is given the same seed for both A and B; i.e., the random number generator used by the Pipeline should not change behavior across Spark versions. Is that a reasonable assumption?
I'll try to think of other possible issues too.
CC: [~mengxr]
> Add random seed Param to ML Pipeline
> ------------------------------------
>
> Key: SPARK-9695
> URL: https://issues.apache.org/jira/browse/SPARK-9695
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Reporter: Joseph K. Bradley
>
> Note this will require some discussion about whether to make HasSeed the main API for whether an algorithm takes a seed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org