You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Weichen Xu (JIRA)" <ji...@apache.org> on 2016/09/03 16:16:20 UTC

[jira] [Created] (SPARK-17390) optimize MultivariantOnlineSummerizer by making the summarized target configurable

Weichen Xu created SPARK-17390:
----------------------------------

             Summary: optimize MultivariantOnlineSummerizer by making the summarized target configurable
                 Key: SPARK-17390
                 URL: https://issues.apache.org/jira/browse/SPARK-17390
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
            Reporter: Weichen Xu


optimize MultivariantOnlineSummerizer by making the summarized target configurable.

for example, if we only need to summarize `mean` and `variance`
we only need to accumulate the following vectors.
currMean, weightSum, currM2n.

so that we can avoid useless computation and serialization, especially when we use MultivariantOnlineSummerizer in RDD.aggregate, when the data dimemsion is large, the extra serialization cost will be large.

because MultivariantOnlineSummerizer can be used widely, it is worth to do this optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org