You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/08/16 10:38:21 UTC

[jira] [Assigned] (SPARK-17001) Enable standardScaler to standardize sparse vectors when withMean=True

     [ https://issues.apache.org/jira/browse/SPARK-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-17001:
------------------------------------

    Assignee: Apache Spark

> Enable standardScaler to standardize sparse vectors when withMean=True
> ----------------------------------------------------------------------
>
>                 Key: SPARK-17001
>                 URL: https://issues.apache.org/jira/browse/SPARK-17001
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Tobi Bosede
>            Assignee: Apache Spark
>            Priority: Minor
>
> When withMean = true, StandardScaler will not handle sparse vectors, and instead throw an exception. This is presumably because subtracting the mean makes a sparse vector dense, and this can be undesirable. 
> However, VectorAssembler generates vectors that may be a mix of sparse and dense, even when vectors are smallish, depending on their values. It's common to feed this into StandardScaler, but it would fail sometimes depending on the input if withMean = true. This is kind of surprising.
> StandardScaler should go ahead and operate on sparse vectors and subtract the mean, if explicitly asked to do so with withMean, on the theory that the user knows what he/she is doing, and there is otherwise no way to make this work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org