You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by Lewuathe <gi...@git.apache.org> on 2015/10/20 15:51:07 UTC

[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

GitHub user Lewuathe opened a pull request:

    https://github.com/apache/spark/pull/9180

    [SPARK-11207][ML] Add test cases for solver selection of LinearRegres…

    …sion as followup. This is the follow up work of SPARK-10668.
    
    * Fix miner style issues.
    * Add test case for checking whether solver is selected properly.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Lewuathe/spark SPARK-11207

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9180
    
----
commit 11cd9c13b78a7c1d9ecfb2950242e0525c3bf303
Author: Lewuathe <le...@me.com>
Date:   2015-10-20T13:50:23Z

    [SPARK-11207][ML] Add test cases for solver selection of LinearRegression as followup.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152457451
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150742199
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152460958
  
    LGTM except the small styling issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43423030
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -51,14 +52,27 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
         super.beforeAll()
         dataset = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
    -        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +        intercept = 6.3, weights = Array(4.7, 7.2), xMean = Array(0.9, -1.3),
    +        xVariance = Array(0.7, 1.2), nPoints = 10000, seed = seed, eps = 0.1), 2))
    --- End diff --
    
    `seed = seed` is not necessary. it's self-explained.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149939754
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149924728
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150733094
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152450697
  
    **[Test build #44669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44669/consoleFull)** for PR 9180 at commit [`74de81e`](https://github.com/apache/spark/commit/74de81ee4439c121437510b9b8e176a4e7df0724).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150070673
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150900670
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42564078
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -34,6 +34,7 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
       private val seed: Int = 42
       @transient var dataset: DataFrame = _
       @transient var datasetWithoutIntercept: DataFrame = _
    +  @transient var datasetWithBigFeature: DataFrame = _
    --- End diff --
    
    `WithBigFeature` -> `WithManyFeatures` or `WithLargeFeatureSize`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43481079
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -49,16 +50,29 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
        */
       override def beforeAll(): Unit = {
         super.beforeAll()
    -    dataset = sqlContext.createDataFrame(
    +    datasetWithDenseFeature = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
    -        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +        intercept = 6.3, weights = Array(4.7, 7.2), xMean = Array(0.9, -1.3),
    +        xVariance = Array(0.7, 1.2), nPoints = 10000, seed = seed, eps = 0.1), 2))
         /*
            datasetWithoutIntercept is not needed for correctness testing but is useful for illustrating
            training model without intercept
          */
    -    datasetWithoutIntercept = sqlContext.createDataFrame(
    +    datasetWithDenseFeatureWithoutIntercept = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
    -        0.0, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +        intercept = 0.0, weights = Array(4.7, 7.2), xMean = Array(0.9, -1.3),
    +        xVariance = Array(0.7, 1.2), nPoints = 10000, seed = seed, eps = 0.1), 2))
    +
    +    val r = new Random(seed)
    +    // When feature size is larger than 4096, normal optimizer is choosed
    +    // as the solver of linear regression in the case of "auto" mode.
    +    val featureSize = 4100
    +    datasetWithSparseFeature = sqlContext.createDataFrame(
    +      sc.parallelize(LinearDataGenerator.generateLinearInput(
    +        intercept = 0.0, weights = Seq.fill(featureSize)(r.nextDouble).toArray,
    +        xMean = Seq.fill(featureSize)(r.nextDouble).toArray,
    +        xVariance = Seq.fill(featureSize)(r.nextDouble).toArray, nPoints = 200,
    +        seed = seed, eps = 0.1, sparsity = 0.7), 2))
    --- End diff --
    
    seed = seed into seed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42923827
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -125,6 +124,58 @@ object LinearDataGenerator {
       }
     
       /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @return Seq of LabeledPoint includes sparse vectors..
    +   */
    +  @Since("1.6.0")
    +  def generateLinearSparseInput(
    +      intercept: Double,
    +      weights: Array[Double],
    +      xMean: Array[Double],
    +      xVariance: Array[Double],
    +      nPoints: Int,
    +      seed: Int,
    +      eps: Double): Seq[LabeledPoint] = {
    +    val rnd = new Random(seed)
    +    val x = Array.fill[Array[Double]](nPoints)(
    +      Array.fill[Double](weights.length)(rnd.nextDouble()))
    +
    +    x.foreach { v =>
    --- End diff --
    
    You can also add the variance of sparsity such that the num of non zeros will not be constant. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43481035
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -49,16 +50,29 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
        */
       override def beforeAll(): Unit = {
         super.beforeAll()
    -    dataset = sqlContext.createDataFrame(
    +    datasetWithDenseFeature = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
    -        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +        intercept = 6.3, weights = Array(4.7, 7.2), xMean = Array(0.9, -1.3),
    +        xVariance = Array(0.7, 1.2), nPoints = 10000, seed = seed, eps = 0.1), 2))
         /*
            datasetWithoutIntercept is not needed for correctness testing but is useful for illustrating
            training model without intercept
          */
    -    datasetWithoutIntercept = sqlContext.createDataFrame(
    +    datasetWithDenseFeatureWithoutIntercept = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
    -        0.0, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +        intercept = 0.0, weights = Array(4.7, 7.2), xMean = Array(0.9, -1.3),
    +        xVariance = Array(0.7, 1.2), nPoints = 10000, seed = seed, eps = 0.1), 2))
    +
    +    val r = new Random(seed)
    --- End diff --
    
    why do u need this random generator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43419153
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -112,8 +139,13 @@ object LinearDataGenerator {
         x.foreach { v =>
           var i = 0
           val len = v.length
    +      val sparceRnd = new Random(seed)
    --- End diff --
    
    Since you seed `rnd` and `sparceRnd` with the same seed, both of them will generate the same sequence of random numbers which is not what you want. You should be able to use the same random number generator which will give you uncorrelated random numbers in both creating the features and choice which columns to zero out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149925442
  
    **[Test build #44067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44067/consoleFull)** for PR 9180 at commit [`22ba64e`](https://github.com/apache/spark/commit/22ba64ed7ec8121bea2e92edbfeaf1a1913f61d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by Lewuathe <gi...@git.apache.org>.
Github user Lewuathe commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43474383
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -112,8 +139,13 @@ object LinearDataGenerator {
         x.foreach { v =>
           var i = 0
           val len = v.length
    +      val sparceRnd = new Random(seed)
    --- End diff --
    
    If we use same random generator for both creating features and choice which columns to zero, x is  different from current ones. This cause unit test failures. Can we change the assertion tolerance or target written in `LinearRegressionSuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152476370
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44673/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42701567
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -125,6 +125,59 @@ object LinearDataGenerator {
       }
     
       /**
    +   *
    --- End diff --
    
    extra line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43422739
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -121,7 +153,21 @@ object LinearDataGenerator {
         val y = x.map { xi =>
           blas.ddot(weights.length, xi, 1, weights, 1) + intercept + eps * rnd.nextGaussian()
         }
    -    y.zip(x).map(p => LabeledPoint(p._1, Vectors.dense(p._2)))
    +
    --- End diff --
    
    To simplify the following code, do
    
    ```scala
    y.zip(x).map { p => 
      if (sparsity == 0.0) {
        LabeledPoint(p._1, Vectors.dense(p._2))
      } else {
        LabeledPoint(p._1, Vectors.dense(p._2).toSparse)
      }
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149574964
  
    **[Test build #43981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43981/consoleFull)** for PR 9180 at commit [`11cd9c1`](https://github.com/apache/spark/commit/11cd9c13b78a7c1d9ecfb2950242e0525c3bf303).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by Lewuathe <gi...@git.apache.org>.
Github user Lewuathe commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152362069
  
    @dbtsai Thank you so much for reviewing even you would busy in Spark Summit. I'll update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149574420
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by Lewuathe <gi...@git.apache.org>.
Github user Lewuathe commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150410049
  
    @dbtsai Could you check again please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43417114
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -83,7 +83,6 @@ object LinearDataGenerator {
           nPoints, seed, eps)}
    --- End diff --
    
    The formatting in pervious method
    
    ```scala
      def generateLinearInput(
          intercept: Double,
          weights: Array[Double],
          nPoints: Int,
          seed: Int,
          eps: Double = 0.1): Seq[LabeledPoint] = {
        generateLinearInput(intercept, weights,
          Array.fill[Double](weights.length)(0.0),
          Array.fill[Double](weights.length)(1.0 / 3.0),
          nPoints, seed, eps)}
    ```
    
    looks weird for me. Can you fix in this PR? Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150911153
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149924762
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152446197
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44667/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43418378
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -104,7 +103,35 @@ object LinearDataGenerator {
           nPoints: Int,
           seed: Int,
           eps: Double): Seq[LabeledPoint] = {
    +    generateLinearInputInternal(intercept, weights, xMean, xVariance, nPoints, seed, eps, 0.0)
    +  }
     
    +
    +  /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @param sparcity The ratio of zero elements. If it is 0.0, LabeledPoints with
    +   *                 DenseVector is returned.
    +   * @return Seq of input.
    +   */
    +  @Since("1.6.0")
    +  def generateLinearInputInternal(
    +      intercept: Double,
    +      weights: Array[Double],
    +      xMean: Array[Double],
    +      xVariance: Array[Double],
    +      nPoints: Int,
    +      seed: Int,
    +      eps: Double,
    +      sparcity: Double): Seq[LabeledPoint] = {
    +    require(sparcity <= 1.0)
    --- End diff --
    
    I think this should be `require(0.0 < sparsity && sparsity < 1.0)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149596680
  
    **[Test build #43983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43983/consoleFull)** for PR 9180 at commit [`28427d2`](https://github.com/apache/spark/commit/28427d29e8c398f25f9aac10f86074da084a933f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42564084
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -59,6 +60,15 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
         datasetWithoutIntercept = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
             0.0, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +
    +    val r = new Random(seed)
    +    val featureSize = 4100
    +    datasetWithBigFeature = sqlContext.createDataFrame(
    +      sc.parallelize(LinearDataGenerator.generateLinearInput(
    +        0.0, Seq.fill(featureSize)(r.nextDouble).toArray,
    --- End diff --
    
    It would be nice to use keyword arguments, it is hard to guess what `0.0, `200`, and `0.1` mean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42564081
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -59,6 +60,15 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
         datasetWithoutIntercept = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
             0.0, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +
    +    val r = new Random(seed)
    +    val featureSize = 4100
    --- End diff --
    
    leave a comment about this value `4100`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152478405
  
    Thanks. Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150872920
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150742201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44279/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149939452
  
    **[Test build #44067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44067/consoleFull)** for PR 9180 at commit [`22ba64e`](https://github.com/apache/spark/commit/22ba64ed7ec8121bea2e92edbfeaf1a1913f61d4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150084247
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150733090
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150900664
  
    **[Test build #44313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/consoleFull)** for PR 9180 at commit [`0a43033`](https://github.com/apache/spark/commit/0a4303356455f28ca3b87ffd446cb5ef5f25d0e2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by Lewuathe <gi...@git.apache.org>.
Github user Lewuathe commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42927682
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -125,6 +124,58 @@ object LinearDataGenerator {
       }
     
       /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @return Seq of LabeledPoint includes sparse vectors..
    +   */
    --- End diff --
    
    Yes, I also thought it is good idea. But `LinearDataGenerator` is used as static object, then we have to pass `sparsity` as parameter to `generateLinearInput`. This method seems to be used a lot of suites. It is necessary to change a lot of method reference. 
    Therefore it might be better to do in separate JIRA. What do you thing about?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150741936
  
    **[Test build #44279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44279/consoleFull)** for PR 9180 at commit [`2082d47`](https://github.com/apache/spark/commit/2082d4781eeb009c3a0c45d4e92b546960b5a7ff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150899252
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150736219
  
    **[Test build #44279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44279/consoleFull)** for PR 9180 at commit [`2082d47`](https://github.com/apache/spark/commit/2082d4781eeb009c3a0c45d4e92b546960b5a7ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9180


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152466106
  
    **[Test build #44673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44673/consoleFull)** for PR 9180 at commit [`241ec72`](https://github.com/apache/spark/commit/241ec7293607d670c93293b5872b21ed0c9f411a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150872969
  
    **[Test build #44308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44308/consoleFull)** for PR 9180 at commit [`003d3bd`](https://github.com/apache/spark/commit/003d3bd87f3936c4fd6ee0dc77ca81f3811bcbd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43417781
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -104,7 +103,35 @@ object LinearDataGenerator {
           nPoints: Int,
           seed: Int,
           eps: Double): Seq[LabeledPoint] = {
    +    generateLinearInputInternal(intercept, weights, xMean, xVariance, nPoints, seed, eps, 0.0)
    +  }
     
    +
    +  /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @param sparcity The ratio of zero elements. If it is 0.0, LabeledPoints with
    +   *                 DenseVector is returned.
    +   * @return Seq of input.
    +   */
    +  @Since("1.6.0")
    +  def generateLinearInputInternal(
    +      intercept: Double,
    +      weights: Array[Double],
    +      xMean: Array[Double],
    +      xVariance: Array[Double],
    +      nPoints: Int,
    +      seed: Int,
    +      eps: Double,
    +      sparcity: Double): Seq[LabeledPoint] = {
    --- End diff --
    
    ditto. typo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150904896
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152274582
  
    Sorry for the delay. The current implementation of creating sparse features is not efficient since we need to create dense feature first. Let's do it as it. But if you are interested in, let's create another JIRA such that the sparse features can be generated without doing dense one. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150874928
  
    **[Test build #44308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44308/consoleFull)** for PR 9180 at commit [`003d3bd`](https://github.com/apache/spark/commit/003d3bd87f3936c4fd6ee0dc77ca81f3811bcbd7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149597508
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43983/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149578797
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150899440
  
    **[Test build #44313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/consoleFull)** for PR 9180 at commit [`0a43033`](https://github.com/apache/spark/commit/0a4303356455f28ca3b87ffd446cb5ef5f25d0e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150900669
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150897857
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43480991
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -49,16 +50,29 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
        */
       override def beforeAll(): Unit = {
         super.beforeAll()
    -    dataset = sqlContext.createDataFrame(
    +    datasetWithDenseFeature = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
    -        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +        intercept = 6.3, weights = Array(4.7, 7.2), xMean = Array(0.9, -1.3),
    +        xVariance = Array(0.7, 1.2), nPoints = 10000, seed = seed, eps = 0.1), 2))
    --- End diff --
    
    make `seed = seed` into just `seed`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149939755
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44067/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152445468
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149722643
  
    @Lewuathe The added test took 30 seconds to run, which might be too long. Shall we try to reduce the number of iterations?
    
    ~~~
    [info] - linear regression model with l-bfgs with big feature datasets (29 seconds, 82 milliseconds)
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150084004
  
    **[Test build #44116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44116/consoleFull)** for PR 9180 at commit [`f6b2256`](https://github.com/apache/spark/commit/f6b2256fd669585f7e3b082730a63d0dbda631aa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149574456
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43417974
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -34,6 +34,7 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
       private val seed: Int = 42
       @transient var dataset: DataFrame = _
       @transient var datasetWithoutIntercept: DataFrame = _
    +  @transient var datasetWithManyFeature: DataFrame = _
     
    --- End diff --
    
    Let's call it `datasetWithSparseFeature `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149581922
  
    **[Test build #43983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43983/consoleFull)** for PR 9180 at commit [`28427d2`](https://github.com/apache/spark/commit/28427d29e8c398f25f9aac10f86074da084a933f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42930781
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -125,6 +124,58 @@ object LinearDataGenerator {
       }
     
       /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @return Seq of LabeledPoint includes sparse vectors..
    +   */
    --- End diff --
    
    Let's modify the JIRA and do it here. Basically, you can create a `LinearDataGenerator` with old signature calling new API for compatibility issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43417193
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -104,7 +103,35 @@ object LinearDataGenerator {
           nPoints: Int,
           seed: Int,
           eps: Double): Seq[LabeledPoint] = {
    +    generateLinearInputInternal(intercept, weights, xMean, xVariance, nPoints, seed, eps, 0.0)
    +  }
     
    +
    +  /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @param sparcity The ratio of zero elements. If it is 0.0, LabeledPoints with
    +   *                 DenseVector is returned.
    +   * @return Seq of input.
    +   */
    +  @Since("1.6.0")
    +  def generateLinearInputInternal(
    --- End diff --
    
    Just call it `generateLinearInput` without `Internal`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149728408
  
    +1 on @dbtsai 's suggestion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150904892
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149575641
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42564092
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -693,4 +693,18 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
           assert(model4a0.weights ~== model4b.weights absTol 1E-3)
         }
       }
    +
    +  test("linear regression model with l-bfgs with big feature datasets") {
    +    val trainer = new LinearRegression().setSolver("auto")
    +    val model = trainer.fit(datasetWithBigFeature)
    --- End diff --
    
    how long does this test take?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42923796
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -125,6 +124,58 @@ object LinearDataGenerator {
       }
     
       /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @return Seq of LabeledPoint includes sparse vectors..
    +   */
    +  @Since("1.6.0")
    +  def generateLinearSparseInput(
    +      intercept: Double,
    +      weights: Array[Double],
    +      xMean: Array[Double],
    +      xVariance: Array[Double],
    +      nPoints: Int,
    +      seed: Int,
    +      eps: Double): Seq[LabeledPoint] = {
    +    val rnd = new Random(seed)
    +    val x = Array.fill[Array[Double]](nPoints)(
    +      Array.fill[Double](weights.length)(rnd.nextDouble()))
    +
    +    x.foreach { v =>
    --- End diff --
    
    Once you have `sparsity`, randomly choose `n = numFeatures * (1 - sparsity)` as non-zero features, and zero the rest out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152445431
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149575632
  
    **[Test build #43981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43981/consoleFull)** for PR 9180 at commit [`11cd9c1`](https://github.com/apache/spark/commit/11cd9c13b78a7c1d9ecfb2950242e0525c3bf303).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152464747
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149727573
  
    Or you can make them sparse by randomly choosing most of the features zeros. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43417836
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -104,7 +103,35 @@ object LinearDataGenerator {
           nPoints: Int,
           seed: Int,
           eps: Double): Seq[LabeledPoint] = {
    +    generateLinearInputInternal(intercept, weights, xMean, xVariance, nPoints, seed, eps, 0.0)
    +  }
     
    +
    +  /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @param sparcity The ratio of zero elements. If it is 0.0, LabeledPoints with
    +   *                 DenseVector is returned.
    +   * @return Seq of input.
    +   */
    +  @Since("1.6.0")
    +  def generateLinearInputInternal(
    +      intercept: Double,
    +      weights: Array[Double],
    +      xMean: Array[Double],
    +      xVariance: Array[Double],
    +      nPoints: Int,
    +      seed: Int,
    +      eps: Double,
    +      sparcity: Double): Seq[LabeledPoint] = {
    +    require(sparcity <= 1.0)
    --- End diff --
    
    What `sparsity == 0.0` means? All zeros? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150874949
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44308/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152476369
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150911155
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152457453
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44669/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by Lewuathe <gi...@git.apache.org>.
Github user Lewuathe commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150062888
  
    ```
    - linear regression model with l-bfgs with big feature datasets (14 seconds, 524 milliseconds)
    ```
    
    It takes about the half of the initial one. Could review this again? > @dbtsai @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43418162
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -34,6 +34,7 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
       private val seed: Int = 42
       @transient var dataset: DataFrame = _
       @transient var datasetWithoutIntercept: DataFrame = _
    +  @transient var datasetWithManyFeature: DataFrame = _
     
    --- End diff --
    
    Also, changed `dataset` into `datasetWithDenseFeature`, and `datasetWithoutIntercept` into `datasetWithDenseFeatureWithoutIntercept`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150071490
  
    **[Test build #44116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44116/consoleFull)** for PR 9180 at commit [`f6b2256`](https://github.com/apache/spark/commit/f6b2256fd669585f7e3b082730a63d0dbda631aa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150084248
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44116/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150874948
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150872915
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152449029
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152445929
  
    **[Test build #44667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44667/consoleFull)** for PR 9180 at commit [`97c76c9`](https://github.com/apache/spark/commit/97c76c93c8b1b93661ec6a3b88a1ecc3e9980197).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150911125
  
    **[Test build #44317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/consoleFull)** for PR 9180 at commit [`59383fd`](https://github.com/apache/spark/commit/59383fd41f1d6b96274c564eb2fb7c96f5ab07e0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r42923610
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -125,6 +124,58 @@ object LinearDataGenerator {
       }
     
       /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @return Seq of LabeledPoint includes sparse vectors..
    +   */
    --- End diff --
    
    How about consolidate with `LinearDataGenerator`, and add `sparsity = 1.0` as param to control if it's sparse feature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by Lewuathe <gi...@git.apache.org>.
Github user Lewuathe commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150917384
  
    @dbtsai Sorry for bothering many times but could check again please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152449043
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150905267
  
    **[Test build #44317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/consoleFull)** for PR 9180 at commit [`59383fd`](https://github.com/apache/spark/commit/59383fd41f1d6b96274c564eb2fb7c96f5ab07e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152457370
  
    **[Test build #44669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44669/consoleFull)** for PR 9180 at commit [`74de81e`](https://github.com/apache/spark/commit/74de81ee4439c121437510b9b8e176a4e7df0724).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149597507
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43422079
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -104,7 +103,35 @@ object LinearDataGenerator {
           nPoints: Int,
           seed: Int,
           eps: Double): Seq[LabeledPoint] = {
    +    generateLinearInputInternal(intercept, weights, xMean, xVariance, nPoints, seed, eps, 0.0)
    +  }
     
    +
    +  /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @param sparcity The ratio of zero elements. If it is 0.0, LabeledPoints with
    +   *                 DenseVector is returned.
    +   * @return Seq of input.
    +   */
    +  @Since("1.6.0")
    +  def generateLinearInputInternal(
    +      intercept: Double,
    +      weights: Array[Double],
    +      xMean: Array[Double],
    +      xVariance: Array[Double],
    +      nPoints: Int,
    +      seed: Int,
    +      eps: Double,
    +      sparcity: Double): Seq[LabeledPoint] = {
    +    require(sparcity <= 1.0)
    --- End diff --
    
    Okay, I think it's okay to have sparsity == 1.0. Just have everything zeros.
    
    `require(0.0 <= sparsity && sparsity <= 1.0)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152446195
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43417727
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala ---
    @@ -104,7 +103,35 @@ object LinearDataGenerator {
           nPoints: Int,
           seed: Int,
           eps: Double): Seq[LabeledPoint] = {
    +    generateLinearInputInternal(intercept, weights, xMean, xVariance, nPoints, seed, eps, 0.0)
    +  }
     
    +
    +  /**
    +   * @param intercept Data intercept
    +   * @param weights  Weights to be applied.
    +   * @param xMean the mean of the generated features. Lots of time, if the features are not properly
    +   *              standardized, the algorithm with poor implementation will have difficulty
    +   *              to converge.
    +   * @param xVariance the variance of the generated features.
    +   * @param nPoints Number of points in sample.
    +   * @param seed Random seed
    +   * @param eps Epsilon scaling factor.
    +   * @param sparcity The ratio of zero elements. If it is 0.0, LabeledPoints with
    --- End diff --
    
    Typo: `sparsity`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9180#discussion_r43481002
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -49,16 +50,29 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
        */
       override def beforeAll(): Unit = {
         super.beforeAll()
    -    dataset = sqlContext.createDataFrame(
    +    datasetWithDenseFeature = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
    -        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +        intercept = 6.3, weights = Array(4.7, 7.2), xMean = Array(0.9, -1.3),
    +        xVariance = Array(0.7, 1.2), nPoints = 10000, seed = seed, eps = 0.1), 2))
         /*
            datasetWithoutIntercept is not needed for correctness testing but is useful for illustrating
            training model without intercept
          */
    -    datasetWithoutIntercept = sqlContext.createDataFrame(
    +    datasetWithDenseFeatureWithoutIntercept = sqlContext.createDataFrame(
           sc.parallelize(LinearDataGenerator.generateLinearInput(
    -        0.0, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 10000, seed, 0.1), 2))
    +        intercept = 0.0, weights = Array(4.7, 7.2), xMean = Array(0.9, -1.3),
    +        xVariance = Array(0.7, 1.2), nPoints = 10000, seed = seed, eps = 0.1), 2))
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152464693
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-150070663
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152476226
  
    **[Test build #44673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44673/consoleFull)** for PR 9180 at commit [`241ec72`](https://github.com/apache/spark/commit/241ec7293607d670c93293b5872b21ed0c9f411a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-152446191
  
    **[Test build #44667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44667/consoleFull)** for PR 9180 at commit [`97c76c9`](https://github.com/apache/spark/commit/97c76c93c8b1b93661ec6a3b88a1ecc3e9980197).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149578865
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9180#issuecomment-149575643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43981/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org