You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yana Kadiyska <ya...@gmail.com> on 2014/09/04 02:10:16 UTC

[MLib] How do you normalize features?

It seems like the next release will add a nice org.apache.spark.mllib.feature
package but what is the recommended way to normalize features in the
current release (1.0.2) -- I'm hoping for a general pointer here.

At the moment I have a RDD[LabeledPoint] and I can get
a MultivariateStatisticalSummary for mean/variance. Is that about the right
way to proceed? I'm also not seeing an easy way to subtract vectors -- do I
need to do this element-wise?

thanks

Re: [MLib] How do you normalize features?

Posted by Xiangrui Meng <me...@gmail.com>.
Maybe copy the implementation of StandardScaler from 1.1 and use it in
v1.0.x. -Xiangrui

On Wed, Sep 3, 2014 at 5:10 PM, Yana Kadiyska <ya...@gmail.com> wrote:
> It seems like the next release will add a nice
> org.apache.spark.mllib.feature package but what is the recommended way to
> normalize features in the current release (1.0.2) -- I'm hoping for a
> general pointer here.
>
> At the moment I have a RDD[LabeledPoint] and I can get a
> MultivariateStatisticalSummary for mean/variance. Is that about the right
> way to proceed? I'm also not seeing an easy way to subtract vectors -- do I
> need to do this element-wise?
>
> thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org