You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by RoyGaoVLIS <ro...@zju.edu.cn> on 2015/06/02 10:25:37 UTC

about Spark MLlib StandardScaler's Implementation

Hi,
	When I was trying to add test case for ML’s StandardScaler, I found MLlib’s
StandardScaler’s output different from R with params（withMean false,
withScale true）
	Because columns is divided by root-mean-square rather than standard
deviation in R, the scale function.
	I’ m confused about Spark MLlib’s implementation.
	AnyBody can give me a hand ? thx



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/about-Spark-MLlib-StandardScaler-s-Implementation-tp12554.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: about Spark MLlib StandardScaler's Implementation

Posted by Joseph Bradley <jo...@databricks.com>.

Your understanding is correct: When used without centering (withMean =
false), the 2 implementations are different:
* R: normalize by RMS
* MLlib: normalize by stddev
With centering, they are the same.

It's hard to say which one is better a priori, but my guess is that most R
users center their data.  (Centering is nice to do, except on big data
where it makes vectors dense.)  Note that R does allow you to normalize by
stddev without centering:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html

Joseph

On Tue, Jun 2, 2015 at 1:25 AM, RoyGaoVLIS <ro...@zju.edu.cn> wrote:

> Hi,
>         When I was trying to add test case for ML’s StandardScaler, I
> found MLlib’s
> StandardScaler’s output different from R with params（withMean false,
> withScale true）
>         Because columns is divided by root-mean-square rather than standard
> deviation in R, the scale function.
>         I’ m confused about Spark MLlib’s implementation.
>         AnyBody can give me a hand ? thx
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/about-Spark-MLlib-StandardScaler-s-Implementation-tp12554.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>