You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by erikvanoosten <gi...@git.apache.org> on 2015/04/13 14:15:10 UTC

[GitHub] spark pull request: Fix for sum on empty RDD fails with exception ...

GitHub user erikvanoosten opened a pull request:

    https://github.com/apache/spark/pull/5489

    Fix for sum on empty RDD fails with exception (SPARK-6878)

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/erikvanoosten/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5489.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5489
    
----
commit f1708c97533f6a89ade7fca897678d2ccff5ca36
Author: Erik van Oosten <ev...@ebay.com>
Date:   2015-04-13T12:13:37Z

    Fix for sum on empty RDD fails with exception (SPARK-6878)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Fix for sum on empty RDD fails with exception ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92332317
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5489#discussion_r28233582
  
    --- Diff: core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala ---
    @@ -18,10 +18,17 @@
     package org.apache.spark.rdd
     
     import org.scalatest.FunSuite
    +import org.scalatest.Matchers._
     
     import org.apache.spark._
     
     class DoubleRDDSuite extends FunSuite with SharedSparkContext {
    +  test("sum") {
    +    sc.parallelize(Seq.empty[Double]).sum() should be(0.0 +- 0.0001)
    --- End diff --
    
    Doubles aren't unstable; `0.0 == 0.0` always. Yes i know what you mean but in these cases we can expect the result to exact to machine precision, always.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Fix for sum on empty RDD fails with exception ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92331916
  
    (Could you start the title with `SPARK-xxxx [CORE] ...` ? helps the merge messages and PR system. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92360598
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30158/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92379601
  
      [Test build #30162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30162/consoleFull) for   PR 5489 at commit [`1c91954`](https://github.com/apache/spark/commit/1c9195415d5238a8ce067e27440193ff3f831706).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92360582
  
      [Test build #30158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30158/consoleFull) for   PR 5489 at commit [`f1708c9`](https://github.com/apache/spark/commit/f1708c97533f6a89ade7fca897678d2ccff5ca36).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class StringIndexer extends Estimator[StringIndexerModel] with StringIndexerBase `
      * `class VectorAssembler extends Transformer with HasInputCols with HasOutputCol `
      * `class VectorIndexer extends Estimator[VectorIndexerModel] with VectorIndexerParams `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Fix for sum on empty RDD fails with exception ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5489#discussion_r28233099
  
    --- Diff: core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala ---
    @@ -18,10 +18,17 @@
     package org.apache.spark.rdd
     
     import org.scalatest.FunSuite
    +import org.scalatest.Matchers._
     
     import org.apache.spark._
     
     class DoubleRDDSuite extends FunSuite with SharedSparkContext {
    +  test("sum") {
    +    sc.parallelize(Seq.empty[Double]).sum() should be(0.0 +- 0.0001)
    --- End diff --
    
    I would just use `assert` here as int he rest of the file. Also I think you can assert exact equality in these cases. It really should be `3.0` in the last instance, precisely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by erikvanoosten <gi...@git.apache.org>.
Github user erikvanoosten commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5489#discussion_r28233485
  
    --- Diff: core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala ---
    @@ -18,10 +18,17 @@
     package org.apache.spark.rdd
     
     import org.scalatest.FunSuite
    +import org.scalatest.Matchers._
     
     import org.apache.spark._
     
     class DoubleRDDSuite extends FunSuite with SharedSparkContext {
    +  test("sum") {
    +    sc.parallelize(Seq.empty[Double]).sum() should be(0.0 +- 0.0001)
    --- End diff --
    
    Experience thought me that exact equality with doubles is unreliable. Now I err on the safe side and always use inexact matchers in unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by erikvanoosten <gi...@git.apache.org>.
Github user erikvanoosten commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5489#discussion_r28235308
  
    --- Diff: core/src/test/scala/org/apache/spark/rdd/DoubleRDDSuite.scala ---
    @@ -18,10 +18,17 @@
     package org.apache.spark.rdd
     
     import org.scalatest.FunSuite
    +import org.scalatest.Matchers._
     
     import org.apache.spark._
     
     class DoubleRDDSuite extends FunSuite with SharedSparkContext {
    +  test("sum") {
    +    sc.parallelize(Seq.empty[Double]).sum() should be(0.0 +- 0.0001)
    --- End diff --
    
    I didn't say unstable :)
    Anyway, changed as requested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92343353
  
      [Test build #30162 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30162/consoleFull) for   PR 5489 at commit [`1c91954`](https://github.com/apache/spark/commit/1c9195415d5238a8ce067e27440193ff3f831706).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92343292
  
    LGTM pending tests, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92333283
  
      [Test build #30158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30158/consoleFull) for   PR 5489 at commit [`f1708c9`](https://github.com/apache/spark/commit/f1708c97533f6a89ade7fca897678d2ccff5ca36).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Fix for sum on empty RDD fails with exception ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92332047
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5489


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-6878 [CORE] Fix for sum on empty RDD fai...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5489#issuecomment-92379643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30162/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org