You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dedunumax <gi...@git.apache.org> on 2018/04/21 08:28:29 UTC
[GitHub] spark pull request #21120: [SPARK-22448][ML] Added sum function to Summerize...
GitHub user dedunumax opened a pull request:
https://github.com/apache/spark/pull/21120
[SPARK-22448][ML] Added sum function to Summerizer and MultivariateOn…
…lineSummarizer
## What changes were proposed in this pull request?
This is going to add sum function to Summerizer and MultivariateOnlineSummarizer.
## How was this patch tested?
Added unit test to make sure it works.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dedunumax/spark SPARK-22448
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21120.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21120
----
commit 8c34fc9cfed27a3b53ead302088ab6f59e3690d4
Author: Dedunu Dhananjaya <de...@...>
Date: 2018-04-21T08:24:19Z
[SPARK-22448][ML] Added sum function to Summerizer and MultivariateOnlineSummarizer
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:
https://github.com/apache/spark/pull/21120
Having sum as a basic statistic will make the API user-friendly. I'm thinking about implementing other functions as well. Do you think it is not worth to implement this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21120
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:
https://github.com/apache/spark/pull/21120
cc @rxin @cloud-fan @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21120
**[Test build #89780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89780/testReport)** for PR 21120 at commit [`869f215`](https://github.com/apache/spark/commit/869f215393250e3cab9a226593a850fa15e82f5a).
* This patch **fails MiMa tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21120
**[Test build #89780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89780/testReport)** for PR 21120 at commit [`869f215`](https://github.com/apache/spark/commit/869f215393250e3cab9a226593a850fa15e82f5a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21120
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21120: [SPARK-22448][ML] Added sum function to Summerize...
Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax closed the pull request at:
https://github.com/apache/spark/pull/21120
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:
https://github.com/apache/spark/pull/21120
I see, I will change the code like that.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21120
Why do you want to add this? Once we have mean, it's easy to compute sum.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21120
**[Test build #89703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89703/testReport)** for PR 21120 at commit [`8c34fc9`](https://github.com/apache/spark/commit/8c34fc9cfed27a3b53ead302088ab6f59e3690d4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21120: [SPARK-22448][ML] Added sum function to Summerize...
Posted by mahmoudmahdi24 <gi...@git.apache.org>.
Github user mahmoudmahdi24 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21120#discussion_r199119223
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -562,6 +573,23 @@ private[ml] object SummaryBuilderImpl extends Logging {
Vectors.dense(currL1)
}
+
+ /**
+ * Sum of each dimension
+ */
+ def sum: Vector = {
+ require(requestedMetrics.contains(Sum))
+ require(totalWeightSum > 0, s"Nothing has been added to this summarizer.")
+
+ val realSum = Array.ofDim[Double](n)
+ var i = 0
+ val len = currMean.length
+ while (i < len) {
+ realSum(i) = currMean(i) * weightSum(i)
+ i += 1
--- End diff --
Please avoid using mutable values, use foldLeft for example to solve this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21120
I doubt that this will slow down the summarizer performance because you add sum statistics internally (and this sum value will possible to overflow).
We can directly use `count * mean` to get the sum if we want to use it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21120
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89780/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21120: [SPARK-22448][ML] Added sum function to Summerize...
Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on a diff in the pull request:
https://github.com/apache/spark/pull/21120#discussion_r199135313
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -562,6 +573,23 @@ private[ml] object SummaryBuilderImpl extends Logging {
Vectors.dense(currL1)
}
+
+ /**
+ * Sum of each dimension
+ */
+ def sum: Vector = {
+ require(requestedMetrics.contains(Sum))
+ require(totalWeightSum > 0, s"Nothing has been added to this summarizer.")
+
+ val realSum = Array.ofDim[Double](n)
+ var i = 0
+ val len = currMean.length
+ while (i < len) {
+ realSum(i) = currMean(i) * weightSum(i)
+ i += 1
--- End diff --
I will change that.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21120
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21120
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21120
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89703/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21120
**[Test build #89703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89703/testReport)** for PR 21120 at commit [`8c34fc9`](https://github.com/apache/spark/commit/8c34fc9cfed27a3b53ead302088ab6f59e3690d4).
* This patch **fails MiMa tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21120
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:
https://github.com/apache/spark/pull/21120
cc @WeichenXu123 @dbtsai
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21120
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...
Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:
https://github.com/apache/spark/pull/21120
cc @WeichenXu123 @dbtsai
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org