You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dedunumax <gi...@git.apache.org> on 2018/04/21 08:28:29 UTC

[GitHub] spark pull request #21120: [SPARK-22448][ML] Added sum function to Summerize...

GitHub user dedunumax opened a pull request:

    https://github.com/apache/spark/pull/21120

    [SPARK-22448][ML] Added sum function to Summerizer and MultivariateOn…

    …lineSummarizer
    
    ## What changes were proposed in this pull request?
    
    This is going to add sum function to Summerizer and MultivariateOnlineSummarizer.
    
    ## How was this patch tested?
    
    Added unit test to make sure it works.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dedunumax/spark SPARK-22448

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21120
    
----
commit 8c34fc9cfed27a3b53ead302088ab6f59e3690d4
Author: Dedunu Dhananjaya <de...@...>
Date:   2018-04-21T08:24:19Z

    [SPARK-22448][ML] Added sum function to Summerizer and MultivariateOnlineSummarizer

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Having sum as a basic statistic will make the API user-friendly. I'm thinking about implementing other functions as well. Do you think it is not worth to implement this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    cc @rxin @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    **[Test build #89780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89780/testReport)** for PR 21120 at commit [`869f215`](https://github.com/apache/spark/commit/869f215393250e3cab9a226593a850fa15e82f5a).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    **[Test build #89780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89780/testReport)** for PR 21120 at commit [`869f215`](https://github.com/apache/spark/commit/869f215393250e3cab9a226593a850fa15e82f5a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21120: [SPARK-22448][ML] Added sum function to Summerize...

Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax closed the pull request at:

    https://github.com/apache/spark/pull/21120


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    I see, I will change the code like that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Why do you want to add this? Once we have mean, it's easy to compute sum.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    **[Test build #89703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89703/testReport)** for PR 21120 at commit [`8c34fc9`](https://github.com/apache/spark/commit/8c34fc9cfed27a3b53ead302088ab6f59e3690d4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21120: [SPARK-22448][ML] Added sum function to Summerize...

Posted by mahmoudmahdi24 <gi...@git.apache.org>.
Github user mahmoudmahdi24 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21120#discussion_r199119223
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
    @@ -562,6 +573,23 @@ private[ml] object SummaryBuilderImpl extends Logging {
     
           Vectors.dense(currL1)
         }
    +
    +    /**
    +     * Sum of each dimension
    +     */
    +    def sum: Vector = {
    +      require(requestedMetrics.contains(Sum))
    +      require(totalWeightSum > 0, s"Nothing has been added to this summarizer.")
    +
    +      val realSum = Array.ofDim[Double](n)
    +      var i = 0
    +      val len = currMean.length
    +      while (i < len) {
    +        realSum(i) = currMean(i) * weightSum(i)
    +        i += 1
    --- End diff --
    
    Please avoid using mutable values, use foldLeft for example to solve this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    I doubt that this will slow down the summarizer performance because you add sum statistics internally (and this sum value will possible to overflow). 
    We can directly use `count * mean` to get the sum if we want to use it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89780/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21120: [SPARK-22448][ML] Added sum function to Summerize...

Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21120#discussion_r199135313
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
    @@ -562,6 +573,23 @@ private[ml] object SummaryBuilderImpl extends Logging {
     
           Vectors.dense(currL1)
         }
    +
    +    /**
    +     * Sum of each dimension
    +     */
    +    def sum: Vector = {
    +      require(requestedMetrics.contains(Sum))
    +      require(totalWeightSum > 0, s"Nothing has been added to this summarizer.")
    +
    +      val realSum = Array.ofDim[Double](n)
    +      var i = 0
    +      val len = currMean.length
    +      while (i < len) {
    +        realSum(i) = currMean(i) * weightSum(i)
    +        i += 1
    --- End diff --
    
    I will change that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89703/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    **[Test build #89703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89703/testReport)** for PR 21120 at commit [`8c34fc9`](https://github.com/apache/spark/commit/8c34fc9cfed27a3b53ead302088ab6f59e3690d4).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    cc @WeichenXu123 @dbtsai 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

Posted by dedunumax <gi...@git.apache.org>.
Github user dedunumax commented on the issue:

    https://github.com/apache/spark/pull/21120
  
    cc @WeichenXu123 @dbtsai 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org