You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Weichen Xu (JIRA)" <ji...@apache.org> on 2016/09/03 16:16:20 UTC
[jira] [Created] (SPARK-17390) optimize
MultivariantOnlineSummerizer by making the summarized target configurable
Weichen Xu created SPARK-17390:
----------------------------------
Summary: optimize MultivariantOnlineSummerizer by making the summarized target configurable
Key: SPARK-17390
URL: https://issues.apache.org/jira/browse/SPARK-17390
Project: Spark
Issue Type: Improvement
Components: ML, MLlib
Reporter: Weichen Xu
optimize MultivariantOnlineSummerizer by making the summarized target configurable.
for example, if we only need to summarize `mean` and `variance`
we only need to accumulate the following vectors.
currMean, weightSum, currM2n.
so that we can avoid useless computation and serialization, especially when we use MultivariantOnlineSummerizer in RDD.aggregate, when the data dimemsion is large, the extra serialization cost will be large.
because MultivariantOnlineSummerizer can be used widely, it is worth to do this optimization.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org