You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by ogeagla <og...@gmail.com> on 2015/01/09 21:21:52 UTC

Re-use scaling means and variances from StandardScalerModel

Hello,

I would like to re-use the means and variances computed by the fit function
in the StandardScaler, as I persist them and my use case requires consisted
scaling of data based on some initial data set.  The StandardScalerModel's
constructor takes means and variances, but is private[mllib]. 
Forking/compiling Spark or copy/pasting the class into my project are both
options, but  I'd like to stay away from them.  Any chance there is interest
in a PR to allow this re-use via removal of private from the the
constructor?  Or perhaps an alternative solution exists?  

Thanks,
Octavian



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-use-scaling-means-and-variances-from-StandardScalerModel-tp10073.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Re-use scaling means and variances from StandardScalerModel

Posted by Octavian Geagla <og...@gmail.com>.
Thanks for the suggestions.  

I've opened this JIRA ticket:
https://issues.apache.org/jira/browse/SPARK-5207
Feel free to modify it, assign it to me, kick off a discussion, etc.  

I'd be more than happy to own this feature and PR.

Thanks,
-Octavian



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-use-scaling-means-and-variances-from-StandardScalerModel-tp10073p10092.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Re-use scaling means and variances from StandardScalerModel

Posted by Xiangrui Meng <me...@gmail.com>.
Feel free to create a JIRA for this issue. We might need to discuss
what to put in the public constructors. In the meanwhile, you can use
Java serialization to save/load the model:

sc.parallelize(Seq(model), 1).saveAsObjectFile("/tmp/model")
val model = sc.objectFile[StandardScalerModel]("/tmp/model").first()

-Xiangrui

On Fri, Jan 9, 2015 at 12:21 PM, ogeagla <og...@gmail.com> wrote:
> Hello,
>
> I would like to re-use the means and variances computed by the fit function
> in the StandardScaler, as I persist them and my use case requires consisted
> scaling of data based on some initial data set.  The StandardScalerModel's
> constructor takes means and variances, but is private[mllib].
> Forking/compiling Spark or copy/pasting the class into my project are both
> options, but  I'd like to stay away from them.  Any chance there is interest
> in a PR to allow this re-use via removal of private from the the
> constructor?  Or perhaps an alternative solution exists?
>
> Thanks,
> Octavian
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-use-scaling-means-and-variances-from-StandardScalerModel-tp10073.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org