You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/05/07 04:42:59 UTC

[jira] [Commented] (SPARK-7407) Use uid and param name to identify a parameter instead of the param object

    [ https://issues.apache.org/jira/browse/SPARK-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531898#comment-14531898 ] 

Joseph K. Bradley commented on SPARK-7407:
------------------------------------------

I hope we can make this change without changing the user-facing API.  That seems very doable for Scala, where ParamMap is a class.  It sounds harder for Python.  Should we make it a class there too?

> Use uid and param name to identify a parameter instead of the param object
> --------------------------------------------------------------------------
>
>                 Key: SPARK-7407
>                 URL: https://issues.apache.org/jira/browse/SPARK-7407
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> Transferring parameter values from one to another have been the pain point in the ML pipeline implementation. Because we use the param object as the key in the param map, we have to correctly copy them when making a copy of the transformer, estimator, and models. This becomes complicated when meta-algorithms are involved. For example, in cross validation:
> {code}
> val cv = new CrossValidator()
>   .setEstimator(lr)
>   .setEstimatorParamMaps(epm)
> {code}
> When we make a copy of `cv` with extra params that contain estimator params,
> {code}
> cv.copy(ParamMap(cv.numFolds -> 3, lr.maxIter -> 10))
> {code}
> we need to make a copy of the `lr` object as well and map `epm` to use the new param keys from the old `lr`. This is quite error-prone, especially if the estimator itself is another meta-algorithm.
> Using uid + param name as the key in param maps and using the same uid in copy (and between estimator/model pairs) would simplify the implementations. We don't need to change the keys since the copied instance has the same id as the original instance. And it is easier to find models from a fitted pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org