You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rotationsymmetry <gi...@git.apache.org> on 2015/09/06 22:30:51 UTC

[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

GitHub user rotationsymmetry opened a pull request:

    https://github.com/apache/spark/pull/8631

    [SPARK-9642] [ML] [WIP] LinearRegression should supported weighted data

    In many modeling application, data points are not necessarily sampled with equal probabilities. Linear regression should support weighting which account the over or under sampling.
    
    work in progress. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rotationsymmetry/spark SPARK-9642

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8631
    
----
commit e9093cbea2554fbc124899a58e3cbfdade5ea795
Author: Meihua Wu <me...@umich.edu>
Date:   2015-09-06T15:15:55Z

    [WIP] Add support for weighted sample and associated test.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141301251
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141533658
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-142060320
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-138123811
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-140972488
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141893861
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141217897
  
      [Test build #42611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42611/console) for   PR 8631 at commit [`3f98247`](https://github.com/apache/spark/commit/3f98247801368a86aaffabd78b3755bf36fab330).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937365
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
           .zip(testSummary.residuals.select("residuals").collect())
           .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 relTol 1E-5 }
       }
    +
    +  test("linear regression with weighted samples"){
    +    val (data, weightedData) = {
    +      val activeData = LinearDataGenerator.generateLinearInput(
    +        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
    +
    +      val rnd = new Random(8392)
    +      val signedData = activeData map { case p: LabeledPoint =>
    +        (rnd.nextGaussian() > 0.0, p)
    +      }
    +
    +      val data1 = signedData flatMap {
    +        case (true, p) => Iterator(p, p)
    +        case (false, p) => Iterator(p)
    +      }
    +
    +      val weightedSignedData = signedData flatMap {
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139481653
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-142080769
  
    Thanks. Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141851806
  
    Can you merge the master to resolve the conflicts? Also, add warning in training summary that it ignores the training weights currently (except for the objective trace).
    
    Other than those small items, LGTM. You may remove WIP.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141301253
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42621/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937291
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -598,17 +629,14 @@ private class LeastSquaresCostFun(
         featuresMean: Array[Double],
         effectiveL2regParam: Double) extends DiffFunction[BDV[Double]] {
     
    -  override def calculate(weights: BDV[Double]): (Double, BDV[Double]) = {
    -    val w = Vectors.fromBreeze(weights)
    +  override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) = {
    +    val coeff = Vectors.fromBreeze(coefficients)
     
    -    val leastSquaresAggregator = data.treeAggregate(new LeastSquaresAggregator(w, labelStd,
    +    val leastSquaresAggregator = data.treeAggregate(new LeastSquaresAggregator(coeff, labelStd,
           labelMean, fitIntercept, featuresStd, featuresMean))(
    -        seqOp = (c, v) => (c, v) match {
    -          case (aggregator, (label, features)) => aggregator.add(label, features)
    -        },
    -        combOp = (c1, c2) => (c1, c2) match {
    -          case (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
    -        })
    +        seqOp = (aggregator, instance) => aggregator.add(instance),
    +        combOp = (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
    +        )
     
    --- End diff --
    
    Move `)` to the end of line `combOp`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39812153
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -123,30 +132,41 @@ class LinearRegression(override val uid: String)
       def setTol(value: Double): this.type = set(tol, value)
       setDefault(tol -> 1E-6)
     
    +  /**
    +   * Whether to over-/under-sample training instances according to the given weights in weightCol.
    +   * If empty, all instances are treated equally (weight 1.0).
    +   * Default is empty, so all instances have weight one.
    +   * @group setParam
    +   */
    +  def setWeightCol(value: String): this.type = set(weightCol, value)
    +  setDefault(weightCol -> "")
    +
       override protected def train(dataset: DataFrame): LinearRegressionModel = {
         // Extract columns from data.  If dataset is persisted, do not persist instances.
    -    val instances = extractLabeledPoints(dataset).map {
    -      case LabeledPoint(label: Double, features: Vector) => (label, features)
    +    val w = if ($(weightCol).isEmpty) lit(1.0) else col($(weightCol))
    +    val instances: RDD[Instance] = dataset.select(col($(labelCol)), w, col($(featuresCol))).map {
    +      case Row(label: Double, weight: Double, features: Vector) =>
    +        Instance(label, weight, features)
         }
    +
         val handlePersistence = dataset.rdd.getStorageLevel == StorageLevel.NONE
         if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
     
    -    val (summarizer, statCounter) = instances.treeAggregate(
    -      (new MultivariateOnlineSummarizer, new StatCounter))(
    -        seqOp = (c, v) => (c, v) match {
    -          case ((summarizer: MultivariateOnlineSummarizer, statCounter: StatCounter),
    -          (label: Double, features: Vector)) =>
    -            (summarizer.add(features), statCounter.merge(label))
    -      },
    -        combOp = (c1, c2) => (c1, c2) match {
    -          case ((summarizer1: MultivariateOnlineSummarizer, statCounter1: StatCounter),
    -          (summarizer2: MultivariateOnlineSummarizer, statCounter2: StatCounter)) =>
    -            (summarizer1.merge(summarizer2), statCounter1.merge(statCounter2))
    -      })
    -
    -    val numFeatures = summarizer.mean.size
    -    val yMean = statCounter.mean
    -    val yStd = math.sqrt(statCounter.variance)
    +    val (featuresSummarizer, ySummarizer) = {
    +      val seqOp = (c: (MultivariateOnlineSummarizer, MultivariateOnlineSummarizer),
    +                   instance: Instance) =>
    +        (c._1.add(instance.features, instance.weight),
    +          c._2.add(Vectors.dense(instance.label), instance.weight))
    +      val combOp = (c1: (MultivariateOnlineSummarizer, MultivariateOnlineSummarizer),
    +                    c2: (MultivariateOnlineSummarizer, MultivariateOnlineSummarizer)) =>
    +        (c1._1.merge(c2._1), c1._2.merge(c2._2))
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141893867
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42746/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141258078
  
      [Test build #42621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42621/consoleFull) for   PR 8631 at commit [`2afa2a1`](https://github.com/apache/spark/commit/2afa2a190368adb99ec398c64744fc7dafc98bed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-140973032
  
      [Test build #42579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42579/consoleFull) for   PR 8631 at commit [`3f98247`](https://github.com/apache/spark/commit/3f98247801368a86aaffabd78b3755bf36fab330).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141901114
  
      [Test build #42747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42747/consoleFull) for   PR 8631 at commit [`b0144ce`](https://github.com/apache/spark/commit/b0144cef37986c97329d7416d53ff9da75d94350).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by rotationsymmetry <gi...@git.apache.org>.
Github user rotationsymmetry commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141136871
  
    retest this please.
    
    "org.apache.spark.HeartbeatReceiverSuite.reregister if heartbeat from removed executor" failed, which should be unrelated to this patch.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-140996512
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141932980
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139481021
  
      [Test build #42318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42318/consoleFull) for   PR 8631 at commit [`e9093cb`](https://github.com/apache/spark/commit/e9093cbea2554fbc124899a58e3cbfdade5ea795).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937155
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -493,26 +515,28 @@ private class LeastSquaresAggregator(
         featuresMean: Array[Double]) extends Serializable {
     
       private var totalCnt: Long = 0L
    +  private var weightSum: Double = 0
    --- End diff --
    
    `private var weightSum: Double = 0.0`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-142015084
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937180
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -31,21 +31,30 @@ import org.apache.spark.ml.util.Identifiable
     import org.apache.spark.mllib.evaluation.RegressionMetrics
     import org.apache.spark.mllib.linalg.{Vector, Vectors}
     import org.apache.spark.mllib.linalg.BLAS._
    -import org.apache.spark.mllib.regression.LabeledPoint
     import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
     import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.{DataFrame, Row}
    -import org.apache.spark.sql.functions.{col, udf}
    -import org.apache.spark.sql.types.StructField
    +import org.apache.spark.sql.functions.{col, udf, lit}
     import org.apache.spark.storage.StorageLevel
    -import org.apache.spark.util.StatCounter
    +
    --- End diff --
    
    remove extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141344297
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141893851
  
      [Test build #42746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42746/console) for   PR 8631 at commit [`57c57f1`](https://github.com/apache/spark/commit/57c57f102ae3d55149c8d3fc3cd7d4c95531f9b3).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Sort(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141160787
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141344287
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141892223
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141374317
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42640/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39995497
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -520,30 +544,33 @@ private class LeastSquaresAggregator(
        * Add a new training data to this LeastSquaresAggregator, and update the loss and gradient
        * of the objective function.
        *
    -   * @param label The label for this data point.
    -   * @param data The features for one data point in dense/sparse vector format to be added
    -   *             into this aggregator.
    +   * @param instance  The data point instance to be added.
        * @return This LeastSquaresAggregator object.
        */
    -  def add(label: Double, data: Vector): this.type = {
    -    require(dim == data.size, s"Dimensions mismatch when adding new sample." +
    -      s" Expecting $dim but got ${data.size}.")
    +  def add(instance: Instance): this.type =
    --- End diff --
    
    Could you add a block of `{}` here. Thanks. After this, it's good to go. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141374315
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by rotationsymmetry <gi...@git.apache.org>.
Github user rotationsymmetry commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39983379
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -520,29 +544,32 @@ private class LeastSquaresAggregator(
        * Add a new training data to this LeastSquaresAggregator, and update the loss and gradient
        * of the objective function.
        *
    -   * @param label The label for this data point.
    -   * @param data The features for one data point in dense/sparse vector format to be added
    -   *             into this aggregator.
    +   * @param instance  The data point instance to be added.
        * @return This LeastSquaresAggregator object.
        */
    -  def add(label: Double, data: Vector): this.type = {
    -    require(dim == data.size, s"Dimensions mismatch when adding new sample." +
    -      s" Expecting $dim but got ${data.size}.")
    +  def add(instance: Instance): this.type = instance match {
    --- End diff --
    
    Good point. I will revise it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39580848
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -572,7 +591,7 @@ private class LeastSquaresAggregator(
         this
       }
     
    -  def count: Long = totalCnt
    +  def count: Double = totalCnt
     
    --- End diff --
    
    We decided to keep `count` as it, and add `weightSum`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141217999
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937357
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
           .zip(testSummary.residuals.select("residuals").collect())
           .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 relTol 1E-5 }
       }
    +
    +  test("linear regression with weighted samples"){
    +    val (data, weightedData) = {
    +      val activeData = LinearDataGenerator.generateLinearInput(
    +        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
    +
    +      val rnd = new Random(8392)
    +      val signedData = activeData map { case p: LabeledPoint =>
    --- End diff --
    
    Please use `activeData.map`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139481651
  
      [Test build #42318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42318/console) for   PR 8631 at commit [`e9093cb`](https://github.com/apache/spark/commit/e9093cbea2554fbc124899a58e3cbfdade5ea795).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class WeightedLabeledPoint(label: Double, features: Vector, weight: Double)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139479926
  
    Jenkins, add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-142060325
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42757/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39812144
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -123,30 +132,41 @@ class LinearRegression(override val uid: String)
       def setTol(value: Double): this.type = set(tol, value)
       setDefault(tol -> 1E-6)
     
    +  /**
    +   * Whether to over-/under-sample training instances according to the given weights in weightCol.
    +   * If empty, all instances are treated equally (weight 1.0).
    +   * Default is empty, so all instances have weight one.
    +   * @group setParam
    +   */
    +  def setWeightCol(value: String): this.type = set(weightCol, value)
    +  setDefault(weightCol -> "")
    +
       override protected def train(dataset: DataFrame): LinearRegressionModel = {
         // Extract columns from data.  If dataset is persisted, do not persist instances.
    -    val instances = extractLabeledPoints(dataset).map {
    -      case LabeledPoint(label: Double, features: Vector) => (label, features)
    +    val w = if ($(weightCol).isEmpty) lit(1.0) else col($(weightCol))
    +    val instances: RDD[Instance] = dataset.select(col($(labelCol)), w, col($(featuresCol))).map {
    +      case Row(label: Double, weight: Double, features: Vector) =>
    +        Instance(label, weight, features)
         }
    +
         val handlePersistence = dataset.rdd.getStorageLevel == StorageLevel.NONE
         if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
     
    -    val (summarizer, statCounter) = instances.treeAggregate(
    -      (new MultivariateOnlineSummarizer, new StatCounter))(
    -        seqOp = (c, v) => (c, v) match {
    -          case ((summarizer: MultivariateOnlineSummarizer, statCounter: StatCounter),
    -          (label: Double, features: Vector)) =>
    -            (summarizer.add(features), statCounter.merge(label))
    -      },
    -        combOp = (c1, c2) => (c1, c2) match {
    -          case ((summarizer1: MultivariateOnlineSummarizer, statCounter1: StatCounter),
    -          (summarizer2: MultivariateOnlineSummarizer, statCounter2: StatCounter)) =>
    -            (summarizer1.merge(summarizer2), statCounter1.merge(statCounter2))
    -      })
    -
    -    val numFeatures = summarizer.mean.size
    -    val yMean = statCounter.mean
    -    val yStd = math.sqrt(statCounter.variance)
    +    val (featuresSummarizer, ySummarizer) = {
    +      val seqOp = (c: (MultivariateOnlineSummarizer, MultivariateOnlineSummarizer),
    +                   instance: Instance) =>
    --- End diff --
    
    indentation. see LoR for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by rotationsymmetry <gi...@git.apache.org>.
Github user rotationsymmetry commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-140972365
  
    @dbtsai Thank you for your comments. I have revised the patch. Please test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139481656
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42318/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39580880
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -589,7 +608,7 @@ private class LeastSquaresAggregator(
      * It's used in Breeze's convex optimization routines.
      */
     private class LeastSquaresCostFun(
    -    data: RDD[(Double, Vector)],
    +    data: RDD[(Double, Vector, Double)],
    --- End diff --
    
    Refactor the `Instance` case class out from LoR, and use it for code readability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by rotationsymmetry <gi...@git.apache.org>.
Github user rotationsymmetry commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141344261
  
    @dbtsai Thanks for the comment on indentation. I have fixed it in the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141492966
  
      [Test build #42670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42670/consoleFull) for   PR 8631 at commit [`854d0bb`](https://github.com/apache/spark/commit/854d0bb58d0a6b43135ce9e750e4f9df36a65003).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141345895
  
      [Test build #42640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42640/consoleFull) for   PR 8631 at commit [`1f731c2`](https://github.com/apache/spark/commit/1f731c28ad8a59f3bf432435253dc7b0984f46b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139479960
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141163296
  
      [Test build #42611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42611/consoleFull) for   PR 8631 at commit [`3f98247`](https://github.com/apache/spark/commit/3f98247801368a86aaffabd78b3755bf36fab330).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141892991
  
      [Test build #42746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42746/consoleFull) for   PR 8631 at commit [`57c57f1`](https://github.com/apache/spark/commit/57c57f102ae3d55149c8d3fc3cd7d4c95531f9b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-142060094
  
      [Test build #42757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42757/console) for   PR 8631 at commit [`b3fae99`](https://github.com/apache/spark/commit/b3fae9954d24d9d88b5bbd016e8f285cae1825fe).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Sort(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39580975
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -123,30 +123,48 @@ class LinearRegression(override val uid: String)
       def setTol(value: Double): this.type = set(tol, value)
       setDefault(tol -> 1E-6)
     
    +  /**
    +   * Whether to over-/undersamples each of training instance according to the given
    --- End diff --
    
    The doc is changed in LoR. Please sync with that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141900330
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-142015115
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-140996513
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42579/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-140996440
  
      [Test build #42579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42579/console) for   PR 8631 at commit [`3f98247`](https://github.com/apache/spark/commit/3f98247801368a86aaffabd78b3755bf36fab330).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937392
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
           .zip(testSummary.residuals.select("residuals").collect())
           .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 relTol 1E-5 }
       }
    +
    +  test("linear regression with weighted samples"){
    +    val (data, weightedData) = {
    +      val activeData = LinearDataGenerator.generateLinearInput(
    +        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
    +
    +      val rnd = new Random(8392)
    +      val signedData = activeData map { case p: LabeledPoint =>
    +        (rnd.nextGaussian() > 0.0, p)
    +      }
    +
    +      val data1 = signedData flatMap {
    +        case (true, p) => Iterator(p, p)
    +        case (false, p) => Iterator(p)
    +      }
    +
    +      val weightedSignedData = signedData flatMap {
    +        case (true, LabeledPoint(label, features)) =>
    +          Iterator(
    +            Instance(label, 1.2, features),
    +            Instance(label, 0.8, features)
    +          )
    +        case (false, LabeledPoint(label, features)) =>
    +          Iterator(
    +            Instance(label, 0.3, features),
    +            Instance(label, 0.1, features),
    +            Instance(label, 0.6, features)
    +          )
    +      }
    +
    +      val noiseData = LinearDataGenerator.generateLinearInput(
    +        2, Array(1, 3), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
    +      val weightedNoiseData = noiseData map {
    +        case LabeledPoint(label, features) => Instance(label, 0, features)
    --- End diff --
    
    Make `case LabeledPoint(label, features) => Instance(label, weight =  0.0, features)` for easier readability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141892209
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-140972473
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139479994
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141218002
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42611/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141374237
  
      [Test build #42640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42640/console) for   PR 8631 at commit [`1f731c2`](https://github.com/apache/spark/commit/1f731c28ad8a59f3bf432435253dc7b0984f46b4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override val uid: String)`
      * `  require(censor == 1.0 || censor == 0.0, "censor of class AFTPoint must be 1.0 or 0.0")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141489727
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141533529
  
      [Test build #42670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42670/console) for   PR 8631 at commit [`854d0bb`](https://github.com/apache/spark/commit/854d0bb58d0a6b43135ce9e750e4f9df36a65003).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39580918
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -123,30 +123,48 @@ class LinearRegression(override val uid: String)
       def setTol(value: Double): this.type = set(tol, value)
       setDefault(tol -> 1E-6)
     
    +  /**
    +   * Whether to over-/undersamples each of training instance according to the given
    +   * weight in `weightCol`. If empty, all samples are supposed to have weights as 1.0.
    +   * Default is empty, so all samples have weight one.
    +   * @group setParam
    +   */
    +  def setWeightCol(value: String): this.type = set(weightCol, value)
    +  setDefault(weightCol -> "")
    +
       override protected def train(dataset: DataFrame): LinearRegressionModel = {
         // Extract columns from data.  If dataset is persisted, do not persist instances.
    -    val instances = extractLabeledPoints(dataset).map {
    --- End diff --
    
    use `lit` and `col` for simplifying the code. See example in LoR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141900346
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141489800
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by rotationsymmetry <gi...@git.apache.org>.
Github user rotationsymmetry commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139587297
  
    @dbtsai Thank you for OKing the test. My patch depends on the `MultivariateOnlineSummarizer` in your PR for applying weights to logistics regressions ([link](https://github.com/apache/spark/pull/7884)). My patch should be OK to test after your PR is merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141160754
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937145
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -520,28 +544,28 @@ private class LeastSquaresAggregator(
        * Add a new training data to this LeastSquaresAggregator, and update the loss and gradient
        * of the objective function.
        *
    -   * @param label The label for this data point.
    -   * @param data The features for one data point in dense/sparse vector format to be added
    -   *             into this aggregator.
    +   * @param data  The data point to be added.
        * @return This LeastSquaresAggregator object.
    --- End diff --
    
    make `data` as `instance`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937401
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
           .zip(testSummary.residuals.select("residuals").collect())
           .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 relTol 1E-5 }
       }
    +
    +  test("linear regression with weighted samples"){
    +    val (data, weightedData) = {
    +      val activeData = LinearDataGenerator.generateLinearInput(
    +        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
    +
    +      val rnd = new Random(8392)
    +      val signedData = activeData map { case p: LabeledPoint =>
    +        (rnd.nextGaussian() > 0.0, p)
    +      }
    +
    +      val data1 = signedData flatMap {
    +        case (true, p) => Iterator(p, p)
    +        case (false, p) => Iterator(p)
    +      }
    +
    +      val weightedSignedData = signedData flatMap {
    +        case (true, LabeledPoint(label, features)) =>
    +          Iterator(
    +            Instance(label, 1.2, features),
    +            Instance(label, 0.8, features)
    +          )
    +        case (false, LabeledPoint(label, features)) =>
    +          Iterator(
    +            Instance(label, 0.3, features),
    +            Instance(label, 0.1, features),
    +            Instance(label, 0.6, features)
    +          )
    +      }
    +
    +      val noiseData = LinearDataGenerator.generateLinearInput(
    +        2, Array(1, 3), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
    +      val weightedNoiseData = noiseData map {
    +        case LabeledPoint(label, features) => Instance(label, 0, features)
    +      }
    +      val data2 = weightedSignedData ++ weightedNoiseData
    +
    +      (sqlContext.createDataFrame(sc.parallelize(data1, 4)),
    +        sqlContext.createDataFrame(sc.parallelize(data2, 4)))
    +    }
    +
    +    val trainer1a = (new LinearRegression).setFitIntercept(true)
    +      .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(true)
    +    val trainer1b = (new LinearRegression).setFitIntercept(true).setWeightCol("weight")
    +      .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(true)
    +    val model1a0 = trainer1a.fit(data)
    +    val model1a1 = trainer1a.fit(weightedData)
    +    val model1b = trainer1b.fit(weightedData)
    +    assert(model1a0.weights !~= model1a1.weights absTol 1E-3)
    +    assert(model1a0.intercept !~= model1a1.intercept absTol 1E-3)
    +    assert(model1a0.weights ~== model1b.weights absTol 1E-3)
    +    assert(model1a0.intercept ~== model1b.intercept absTol 1E-3)
    +
    +    val trainer2a = (new LinearRegression).setFitIntercept(true)
    +      .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(false)
    +    val trainer2b = (new LinearRegression).setFitIntercept(true).setWeightCol("weight")
    +      .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(false)
    +    val model2a0 = trainer2a.fit(data)
    +    val model2a1 = trainer2a.fit(weightedData)
    +    val model2b = trainer2b.fit(weightedData)
    +    assert(model2a0.weights !~= model2a1.weights absTol 1E-3)
    +    assert(model2a0.intercept !~= model2a1.intercept absTol 1E-3)
    +    assert(model2a0.weights ~== model2b.weights absTol 1E-3)
    +    assert(model2a0.intercept ~== model2b.intercept absTol 1E-3)
    +
    +    val trainer3a = (new LinearRegression).setFitIntercept(false)
    +      .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(true)
    +    val trainer3b = (new LinearRegression).setFitIntercept(false).setWeightCol("weight")
    +      .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(true)
    +    val model3a0 = trainer3a.fit(data)
    +    val model3a1 = trainer3a.fit(weightedData)
    +    val model3b = trainer3b.fit(weightedData)
    +    assert(model3a0.weights !~= model3a1.weights absTol 1E-3)
    +    assert(model3a0.weights ~== model3b.weights absTol 1E-3)
    +
    +    val trainer4a = (new LinearRegression).setFitIntercept(false)
    +      .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(false)
    +    val trainer4b = (new LinearRegression).setFitIntercept(false).setWeightCol("weight")
    +      .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(false)
    +    val model4a0 = trainer4a.fit(data)
    +    val model4a1 = trainer4a.fit(weightedData)
    +    val model4b = trainer4b.fit(weightedData)
    +    assert(model4a0.weights !~= model4a1.weights absTol 1E-3)
    +    assert(model4a0.weights ~== model4b.weights absTol 1E-3)
    +
    --- End diff --
    
    remove this extra line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141533663
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42670/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141932981
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42747/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141160556
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39947613
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -520,29 +544,32 @@ private class LeastSquaresAggregator(
        * Add a new training data to this LeastSquaresAggregator, and update the loss and gradient
        * of the objective function.
        *
    -   * @param label The label for this data point.
    -   * @param data The features for one data point in dense/sparse vector format to be added
    -   *             into this aggregator.
    +   * @param instance  The data point instance to be added.
        * @return This LeastSquaresAggregator object.
        */
    -  def add(label: Double, data: Vector): this.type = {
    -    require(dim == data.size, s"Dimensions mismatch when adding new sample." +
    -      s" Expecting $dim but got ${data.size}.")
    +  def add(instance: Instance): this.type = instance match {
    --- End diff --
    
    Since you already move `case Instance(label, weight, features) =>` to new line, let's do
    
    ```scala
    def add(instance: Instance): this.type = {
      instance match { case Instance(label, weight, features) =>
      ...
      }
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937140
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -520,28 +544,28 @@ private class LeastSquaresAggregator(
        * Add a new training data to this LeastSquaresAggregator, and update the loss and gradient
        * of the objective function.
        *
    -   * @param label The label for this data point.
    -   * @param data The features for one data point in dense/sparse vector format to be added
    -   *             into this aggregator.
    +   * @param data  The data point to be added.
        * @return This LeastSquaresAggregator object.
        */
    -  def add(label: Double, data: Vector): this.type = {
    -    require(dim == data.size, s"Dimensions mismatch when adding new sample." +
    -      s" Expecting $dim but got ${data.size}.")
    +  def add(data: Instance): this.type = data match { case Instance(label, weight, features) =>
    +    require(dim == features.size, s"Dimensions mismatch when adding new sample." +
    +      s" Expecting $dim but got ${features.size}.")
    +    require(weight >= 0.0, s"instance weight, ${weight} has to be >= 0.0")
     
    --- End diff --
    
    Please add `if (weight == 0) return this`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141301161
  
      [Test build #42621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42621/console) for   PR 8631 at commit [`2afa2a1`](https://github.com/apache/spark/commit/2afa2a190368adb99ec398c64744fc7dafc98bed).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class Interaction(override val uid: String) extends Transformer`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141932856
  
      [Test build #42747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42747/console) for   PR 8631 at commit [`b0144ce`](https://github.com/apache/spark/commit/b0144cef37986c97329d7416d53ff9da75d94350).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8631#discussion_r39937361
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
           .zip(testSummary.residuals.select("residuals").collect())
           .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 relTol 1E-5 }
       }
    +
    +  test("linear regression with weighted samples"){
    +    val (data, weightedData) = {
    +      val activeData = LinearDataGenerator.generateLinearInput(
    +        6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
    +
    +      val rnd = new Random(8392)
    +      val signedData = activeData map { case p: LabeledPoint =>
    +        (rnd.nextGaussian() > 0.0, p)
    +      }
    +
    +      val data1 = signedData flatMap {
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-139480026
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141257022
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-140583259
  
    Hello, weighted `MultivariateOnlineSummarizer` is merged which unblocks you. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] LinearRegression should supp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-142015309
  
      [Test build #42757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42757/consoleFull) for   PR 8631 at commit [`b3fae99`](https://github.com/apache/spark/commit/b3fae9954d24d9d88b5bbd016e8f285cae1825fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8631#issuecomment-141257039
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org