You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/10/01 20:44:26 UTC

[jira] [Commented] (SPARK-9695) Add random seed Param to ML Pipeline

    [ https://issues.apache.org/jira/browse/SPARK-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940207#comment-14940207 ] 

Joseph K. Bradley commented on SPARK-9695:
------------------------------------------

That's what I would propose to.  There are a few complications to figure out though.

*API*

* If a Pipeline stage has a seed explicitly set, should the Pipeline overwrite that seed?  I'd vote for no.

*What behavior do we want in the situation below?*

Situation:
* User creates a Pipeline with some stages
* User sets pipeline.seed
* User saves pipeline to FILE
* User runs pipeline and produces model A
* User loads Pipeline from FILE and runs it to produce model B

I'd say that the ideal behavior will be for model A and B to produce exactly the same results.  However, this will require us to guarantee that each Pipeline stage is given the same seed for both A and B; i.e., the random number generator used by the Pipeline should not change behavior across Spark versions.  Is that a reasonable assumption?

I'll try to think of other possible issues too.

CC: [~mengxr]


> Add random seed Param to ML Pipeline
> ------------------------------------
>
>                 Key: SPARK-9695
>                 URL: https://issues.apache.org/jira/browse/SPARK-9695
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> Note this will require some discussion about whether to make HasSeed the main API for whether an algorithm takes a seed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org