You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2014/10/24 15:49:41 UTC

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/2928

    SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt dependency (LGPL) with commons-math

    This change replaces usages of colt with commons-math3 equivalents, and makes some minor necessary adjustments to related code and tests to match.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-4022

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2928.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2928
    
----
commit 8246dbd39be7ff162392c59c28dee74a1419e236
Author: Sean Owen <so...@cloudera.com>
Date:   2014-10-22T12:23:03Z

    Replace colt with commons-math3. Some tests do not pass yet.

commit cc477bad4f776a0da113d416ff08b0cbd91646c8
Author: Sean Owen <so...@cloudera.com>
Date:   2014-10-24T13:48:22Z

    Additional test fixes from review

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19362101
  
    --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala ---
    @@ -55,9 +55,10 @@ private[spark] class SumEvaluator(totalOutputs: Int, confidence: Double)
           val sumStdev = math.sqrt(sumVar)
           val confFactor = {
             if (counter.count > 100) {
    -          Probability.normalInverse(1 - (1 - confidence) / 2)
    +          new NormalDistribution().inverseCumulativeProbability(1 - (1 - confidence) / 2)
             } else {
    -          Probability.studentTInverse(1 - confidence, (counter.count - 1).toInt)
    +          val degreesOfFreedom = (counter.count - 1).toInt
    +          new TDistribution(degreesOfFreedom).inverseCumulativeProbability(1 - (1 - confidence) / 2)
    --- End diff --
    
    ditto: It was one-side before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60447180
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22155/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60506206
  
    @srowen Could you check `JavaAPISuite.sample`? We need to update that test as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60397795
  
      [Test build #22136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22136/consoleFull) for   PR 2928 at commit [`cc477ba`](https://github.com/apache/spark/commit/cc477bad4f776a0da113d416ff08b0cbd91646c8).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19368910
  
    --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala ---
    @@ -87,15 +87,19 @@ class BernoulliSampler[T](lb: Double, ub: Double, complement: Boolean = false)
     @DeveloperApi
     class PoissonSampler[T](mean: Double) extends RandomSampler[T, T] {
     
    -  private[random] var rng = new Poisson(mean, new DRand)
    +  private[random] var rng = new PoissonDistribution(mean)
     
       override def setSeed(seed: Long) {
    -    rng = new Poisson(mean, new DRand(seed.toInt))
    +    rng = new PoissonDistribution(
    +      new Well19937c(seed),
    --- End diff --
    
    Ah of course. Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19362087
  
    --- Diff: LICENSE ---
    @@ -1,4 +1,3 @@
    -
    --- End diff --
    
    The license file in Hadoop does have this empty line: https://github.com/apache/hadoop/blob/trunk/LICENSE.txt


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60529854
  
    @mengxr Oops I missed that this failed when I saw the SQL tests failed. Should be OK. I rebased too for good measure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19368695
  
    --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala ---
    @@ -87,15 +87,19 @@ class BernoulliSampler[T](lb: Double, ub: Double, complement: Boolean = false)
     @DeveloperApi
     class PoissonSampler[T](mean: Double) extends RandomSampler[T, T] {
     
    -  private[random] var rng = new Poisson(mean, new DRand)
    +  private[random] var rng = new PoissonDistribution(mean)
     
       override def setSeed(seed: Long) {
    -    rng = new Poisson(mean, new DRand(seed.toInt))
    +    rng = new PoissonDistribution(
    +      new Well19937c(seed),
    --- End diff --
    
    Well19937c is the default. We can construct PoissonDistribution with just `mean` and then call `reseedRandomGenerator`:
    
    http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/distribution/AbstractRealDistribution.html#reseedRandomGenerator(long)
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19363636
  
    --- Diff: LICENSE ---
    @@ -1,4 +1,3 @@
    -
    --- End diff --
    
    Oops, didn't mean to change that. I don't know why that happened and I'll undo it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2928


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19368747
  
    --- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala ---
    @@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double) {
         } else {
           val size = sampleSize.toInt
           if (cache(size) < 0) {
    -        cache(size) = Probability.studentTInverse(1 - confidence, size - 1)
    +        val tDist = new TDistribution(size - 1)
    +        cache(size) = tDist.inverseCumulativeProbability(1 - (1 - confidence) / 2)
    --- End diff --
    
    Thanks for the comment! This is helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60505131
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60459439
  
      [Test build #22176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22176/consoleFull) for   PR 2928 at commit [`5b87a28`](https://github.com/apache/spark/commit/5b87a28f99df1138cc53063c2fb535184ae64613).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60460049
  
      [Test build #22173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22173/consoleFull) for   PR 2928 at commit [`7e5c902`](https://github.com/apache/spark/commit/7e5c9028ddd37de4913fd5cc9bab161b737ea5b3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60462702
  
      [Test build #22176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22176/consoleFull) for   PR 2928 at commit [`5b87a28`](https://github.com/apache/spark/commit/5b87a28f99df1138cc53063c2fb535184ae64613).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60447175
  
      [Test build #22155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22155/consoleFull) for   PR 2928 at commit [`19716c6`](https://github.com/apache/spark/commit/19716c6112bfa5ccec075554e1eab70e5860944a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60506015
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22229/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60532479
  
      [Test build #22253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22253/consoleFull) for   PR 2928 at commit [`61a232f`](https://github.com/apache/spark/commit/61a232ffd0378d56e7e4a3d2c2e8f9f8a9660a80).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60438162
  
      [Test build #22155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22155/consoleFull) for   PR 2928 at commit [`19716c6`](https://github.com/apache/spark/commit/19716c6112bfa5ccec075554e1eab70e5860944a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60389665
  
      [Test build #22136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22136/consoleFull) for   PR 2928 at commit [`cc477ba`](https://github.com/apache/spark/commit/cc477bad4f776a0da113d416ff08b0cbd91646c8).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60460053
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22173/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19364277
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/SampledRDD.scala ---
    @@ -53,9 +53,14 @@ private[spark] class SampledRDD[T: ClassTag](
         if (withReplacement) {
           // For large datasets, the expected number of occurrences of each element in a sample with
           // replacement is Poisson(frac). We use that to get a count for each element.
    -      val poisson = new Poisson(frac, new DRand(split.seed))
    +      val poisson = new PoissonDistribution(
    +        new MersenneTwister(split.seed),
    --- End diff --
    
    Good point, especially on being the default. I will switch to Well19937c.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60638201
  
    LGTM. Verified that `commons.math3` is shaded in the assembly jar. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60506012
  
      [Test build #22229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22229/consoleFull) for   PR 2928 at commit [`5b87a28`](https://github.com/apache/spark/commit/5b87a28f99df1138cc53063c2fb535184ae64613).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60397800
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22136/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60455864
  
      [Test build #22173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22173/consoleFull) for   PR 2928 at commit [`7e5c902`](https://github.com/apache/spark/commit/7e5c9028ddd37de4913fd5cc9bab161b737ea5b3).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19362108
  
    --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala ---
    @@ -245,9 +245,9 @@ private[spark] object StratifiedSamplingUtils extends Logging {
               // Must use the same invoke pattern on the rng as in getSeqOp for with replacement
               // in order to generate the same sequence of random numbers when creating the sample
               val copiesAccepted = if (acceptBound == 0) 0L else rng.nextPoisson(acceptBound)
    -          val copiesWailisted = rng.nextPoisson(finalResult(key).waitListBound)
    +          val copiesWaitlisted = rng.nextPoisson(finalResult(key).waitListBound)
    --- End diff --
    
    Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60532482
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22253/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19362102
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/SampledRDD.scala ---
    @@ -53,9 +53,14 @@ private[spark] class SampledRDD[T: ClassTag](
         if (withReplacement) {
           // For large datasets, the expected number of occurrences of each element in a sample with
           // replacement is Poisson(frac). We use that to get a count for each element.
    -      val poisson = new Poisson(frac, new DRand(split.seed))
    +      val poisson = new PoissonDistribution(
    +        new MersenneTwister(split.seed),
    --- End diff --
    
    The default RNG is Well19937c. From the doc, it is better but slower than MT. I'm leaning towards the default because sampling computation is usually not the bottleneck for Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60529896
  
      [Test build #22253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22253/consoleFull) for   PR 2928 at commit [`61a232f`](https://github.com/apache/spark/commit/61a232ffd0378d56e7e4a3d2c2e8f9f8a9660a80).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60505236
  
      [Test build #22229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22229/consoleFull) for   PR 2928 at commit [`5b87a28`](https://github.com/apache/spark/commit/5b87a28f99df1138cc53063c2fb535184ae64613).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2928#issuecomment-60462708
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22176/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19364103
  
    --- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala ---
    @@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double) {
         } else {
           val size = sampleSize.toInt
           if (cache(size) < 0) {
    -        cache(size) = Probability.studentTInverse(1 - confidence, size - 1)
    +        val tDist = new TDistribution(size - 1)
    +        cache(size) = tDist.inverseCumulativeProbability(1 - (1 - confidence) / 2)
    --- End diff --
    
    It should still be one-sided. The argument to the Colt method (http://acs.lbl.gov/ACSSoftware/colt/api/cern/jet/stat/Probability.html#studentTInverse(double, int)) was `alpha` such that the cumulative probability is `1-alpha/2`. For the Commons Math3 method, the argument is just the cumulative probability (http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/distribution/AbstractRealDistribution.html#inverseCumulativeProbability(double)). So I translated as `1 - (1 - confidence) / 2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2928#discussion_r19362095
  
    --- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala ---
    @@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double) {
         } else {
           val size = sampleSize.toInt
           if (cache(size) < 0) {
    -        cache(size) = Probability.studentTInverse(1 - confidence, size - 1)
    +        val tDist = new TDistribution(size - 1)
    +        cache(size) = tDist.inverseCumulativeProbability(1 - (1 - confidence) / 2)
    --- End diff --
    
    It was one-side before. Could you comment on the change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org