You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by actuaryzhang <gi...@git.apache.org> on 2017/01/30 08:12:59 UTC

[GitHub] spark pull request #16740: [SPARK-19400] Allow GLM to handle intercept only ...

GitHub user actuaryzhang opened a pull request:

    https://github.com/apache/spark/pull/16740

    [SPARK-19400] Allow GLM to handle intercept only model

    ## What changes were proposed in this pull request?
    Intercept-only GLM is failing for non-Gaussian family because of reducing an empty array in IWLS. The following code `val maxTolOfCoefficients = oldCoefficients.toArray.reduce { (x, y) => math.max(math.abs(x), math.abs(y))` fails in the intercept-only model because `oldCoefficients` is empty. This PR fixes this issue. 
    
    @yanboliang @srowen @imatiach-msft @zhengruifeng 
    
    ## How was this patch tested?
    New test for intercept only model.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/actuaryzhang/spark interceptOnly

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16740.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16740
    
----
commit 69d96319abaa7f4aa98c04ce32556a2cb2c065d2
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-01-30T07:59:11Z

    fix error in reducing empty array

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72296/testReport)** for PR 16740 at commit [`3a0a2af`](https://github.com/apache/spark/commit/3a0a2aff5a7b09cb0e1db7ec2e756e55b561eace).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72185/testReport)** for PR 16740 at commit [`0b3c085`](https://github.com/apache/spark/commit/0b3c085e6171737065a1ca1f07c60f33f8c3bf3c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    I don't really expect that we'll be changing things so often that this becomes a hassle. I think there is value in getting known results - in the current test the IRLS solver takes 3 iterations to converge on four data points, and it's not clear to me how this will extend to other links, families, larger datasets.
    
    Regardless, I think we should test more familes and links than just poisson. And instead of using R results we can just compute the result analytically. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72299/testReport)** for PR 16740 at commit [`b57af08`](https://github.com/apache/spark/commit/b57af08f792a59438452a3cef070e16ef51316b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    I agree having a special case is unsatisfying from an engineering perspective. In Spark it's a bit different than R since every iteration of IRLS will launch a Spark job, making a pass over the data, so the cost of the extra iterations is much higher. We have special-cased other algorithms for this reason. 
    
    It's entirely possible I'm missing something since I do not know the GLM code quite so well, and I did not thoroughly check it, but this code seemed to do the trick:
    
    ````scala
    if (numFeatures == 0 && getFitIntercept) {
          val agg = dataset.agg(sum(w * col(getLabelCol)), sum(w)).first()
          val mu = agg.getDouble(0) / agg.getDouble(1)
          val diagInvAtA = (familyAndLink.family.variance(mu) * familyAndLink.link.deriv(mu)) / agg.getDouble(0)
          val model = copyValues(new GeneralizedLinearRegressionModel(uid, Vectors.zeros(0),
            familyAndLink.link.link(mu)).setParent(this))
          val trainingSummary = new GeneralizedLinearRegressionTrainingSummary(dataset, model,
            Array(diagInvAtA), 1, getSolver)
          return model.setSummary(Some(trainingSummary))
        }
    ````
    
    The best answer here may depend on the use cases - do we expect users to be training "intercept-only" models often? If yes, then the savings on the iteration time may be worth it. If not, it _is_ a clunky solution. We can see what others think. 
    
    Also, I got some strange failures when training with no features and `fitIntercept == false`. We should just throw an error in this case and add a test for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72148/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @srowen Thanks much for the suggestion. Included the simplification. Please let me know if there is anything else needed for this PR. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r98578921
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +743,54 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    +
    +    var idx = 0
    +    for (useWeight <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression().setFamily("poisson")
    +        .setLinkPredictionCol("linkPrediction")
    +      if (useWeight) trainer.setWeightCol("weight")
    +      val model = trainer.fit(dataset)
    +      val actual = model.intercept
    +      assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: intercept only GLM with " +
    +        s"useWeight = $useWeight.")
    --- End diff --
    
    `assert(model.coefficients === new DenseVector(Array.empty[Double]))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @actuaryzhang the changes look good to me.  I had some nit-picks which you marked as won't fix, and I'm ok with that.  Thank you for fixing this issue!  Maybe a committer can review this - @jkbradley @srowen @yanboliang thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Ok, yeah, let's go with this fix now then - seems both R and statsmodels fit to compute the null model. Thanks for following up on that!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99413491
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala ---
    @@ -89,7 +89,7 @@ private[ml] class IterativelyReweightedLeastSquares(
           val oldCoefficients = oldModel.coefficients
           val coefficients = model.coefficients
           BLAS.axpy(-1.0, coefficients, oldCoefficients)
    -      val maxTolOfCoefficients = oldCoefficients.toArray.reduce { (x, y) =>
    +      val maxTolOfCoefficients = oldCoefficients.toArray.foldLeft(0.0) { (x, y) =>
    --- End diff --
    
    nit: this could be 
    ````scala
    val maxTol = oldCoefficients.foldLeft(math.abs(oldModel.intercept - model.intercept)) { (x, y) => 
      math.max(math.abs(x), math.abs(y))
    }
    ````


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @sethah Your formula for offset does not seem to be a general solution, and I'm not sure if there exists an analytical formula, in particular when the link function is not identity or log. In GLM, the normal equation for the coefficient is: (y - mu) * w = 0. When there is offset, mu = link.inv(intcpt + offset). Take the case where link.inv is inverse as an example, then we have (y - inverse(intcpt + offset)) * w = 0. This is nonlinear and has to be solved iteratively, right? The following is a simple R example to show that the formula you provided does not recover the coefficient. 
    
    ```
    set.seed(11)
    off <- rlnorm(200)
    a <- 0.52
    mu <- 1/(a + off)
    y <- rgamma(200, 1, scale = mu)
    f <- glm(y~1, offset = off, family = Gamma())
    coef(f)
    1/mean(y - off)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72487/testReport)** for PR 16740 at commit [`37c41aa`](https://github.com/apache/spark/commit/37c41aa598bd0c2e3cf9e42f217233498ffdac23).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99269075
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +743,55 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    +
    +    var idx = 0
    +    for (useWeight <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression().setFamily("poisson")
    +        .setLinkPredictionCol("linkPrediction")
    +      if (useWeight) trainer.setWeightCol("weight")
    +      val model = trainer.fit(dataset)
    +      val actual = model.intercept
    +      assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: intercept only GLM with " +
    +        s"useWeight = $useWeight.")
    +      assert(model.coefficients === new DenseVector(Array.empty[Double]))
    +
    +      val familyLink = FamilyAndLink(trainer)
    +      model.transform(dataset).select("features", "prediction", "linkPrediction").collect()
    +        .foreach {
    +          case Row(features: DenseVector, prediction1: Double, linkPrediction1: Double) =>
    +            val eta = BLAS.dot(features, model.coefficients) + model.intercept
    +            val prediction2 = familyLink.fitted(eta)
    --- End diff --
    
    That was fast! :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99268773
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +743,55 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    +
    +    var idx = 0
    +    for (useWeight <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression().setFamily("poisson")
    +        .setLinkPredictionCol("linkPrediction")
    +      if (useWeight) trainer.setWeightCol("weight")
    +      val model = trainer.fit(dataset)
    +      val actual = model.intercept
    +      assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: intercept only GLM with " +
    +        s"useWeight = $useWeight.")
    +      assert(model.coefficients === new DenseVector(Array.empty[Double]))
    +
    +      val familyLink = FamilyAndLink(trainer)
    +      model.transform(dataset).select("features", "prediction", "linkPrediction").collect()
    +        .foreach {
    +          case Row(features: DenseVector, prediction1: Double, linkPrediction1: Double) =>
    +            val eta = BLAS.dot(features, model.coefficients) + model.intercept
    +            val prediction2 = familyLink.fitted(eta)
    --- End diff --
    
    I don't think we need to test this. This is essentially checking the correctness of the prediction mechanism, regardless of the "intercept-only" part. The prediction mechanism is tested elsewhere. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r98488033
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +743,54 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    +
    +    var idx = 0
    +    for (useWeight <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression().setFamily("poisson")
    +        .setLinkPredictionCol("linkPrediction")
    +      if (useWeight) trainer.setWeightCol("weight")
    +      val model = trainer.fit(dataset)
    +      val actual = model.intercept
    +      assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: intercept only GLM with " +
    +        s"useWeight = $useWeight.")
    +
    +      val familyLink = FamilyAndLink(trainer)
    +      model.transform(dataset).select("features", "prediction", "linkPrediction").collect()
    +        .foreach {
    +          case Row(features: DenseVector, prediction1: Double, linkPrediction1: Double) =>
    --- End diff --
    
    maybe rename these to expectedPrediction and expectedLinkPrediction.  Having numbers at the end of variable names is confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72299/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @sethah Thanks for your input. I can add more tests, but they are not adding too much since the algorithm is already tested in other tests. 
    
    The analytical approach does not integrate well with the summary method. One has to derive the general formula for the standard error of the intercept, and then change the code substantially to make it work with summary. This is not an optimal solution IMO. 
    
    BTW, R fits the intercept only model also using IWLS with multiple iterations. It is just weird to have a special implementation in this case which does not integrate with the current setup. 
    
    @srowen @yanboliang Please advise. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99977340
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -335,6 +335,9 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
           throw new SparkException(msg)
         }
     
    +    require(numFeatures > 0 || $(fitIntercept),
    +      "Specified model is empty with neither intercept nor feature.")
    --- End diff --
    
    The message is a bit cryptic.  How about "GeneralizedLinearRegression was given data with 0 features, and with Param fitIntercept set to false.  To fit a model with 0 features, fitIntercept must be set to true."
    
    Other than this, your PR looks good to me.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99460883
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala ---
    @@ -89,7 +89,7 @@ private[ml] class IterativelyReweightedLeastSquares(
           val oldCoefficients = oldModel.coefficients
           val coefficients = model.coefficients
           BLAS.axpy(-1.0, coefficients, oldCoefficients)
    -      val maxTolOfCoefficients = oldCoefficients.toArray.reduce { (x, y) =>
    +      val maxTolOfCoefficients = oldCoefficients.toArray.foldLeft(0.0) { (x, y) =>
    --- End diff --
    
    Will not change this. I think the current one is more clear. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @sethah Thanks for the clarification and providing an implementation. So, the pros is some speed improvement and the cons is the increased complexity (now we have three case: one for intercept only, one for Gaussian with identity and one for all the others). Let's see get other committers' opinions. 
    
    Yes, I will throw an error for the special case of no intercept and no features. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @imatiach-msft Thanks for the comments! This test is based on existing tests in GLM. I can try to improve the style and streamline **all** tests in another PR but it will be weird to just change this test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72161/testReport)** for PR 16740 at commit [`6e0b6bf`](https://github.com/apache/spark/commit/6e0b6bf26ff3a0599ffa8aadf0f5d2e55f01b82c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72559/testReport)** for PR 16740 at commit [`5a0ff27`](https://github.com/apache/spark/commit/5a0ff273ea4e22933a94b33b6d8dc4eb9c8148e6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72298/testReport)** for PR 16740 at commit [`931f7ec`](https://github.com/apache/spark/commit/931f7ecceff7a0cb0c1870af7e69d38454078c52).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @sethah Thanks for the review. Made changes you suggested (except for the nit part). I added more tests although I don't think they are really necessary. The analytical approach is taking a different path from IRWLS, so I agree if we use it then we should thoroughly test it. But the current fix is just allowing the existing algorithm to work in a special case, which is well tested in the more general cases. Anyway, hope we can move close this PR now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72298/testReport)** for PR 16740 at commit [`931f7ec`](https://github.com/apache/spark/commit/931f7ecceff7a0cb0c1870af7e69d38454078c52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72360/testReport)** for PR 16740 at commit [`95b7a10`](https://github.com/apache/spark/commit/95b7a1032896dcb65698b2bed355cbd7305a1eaa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r98485335
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala ---
    @@ -86,13 +86,11 @@ private[ml] class IterativelyReweightedLeastSquares(
             standardizeFeatures = false, standardizeLabel = false).fit(newInstances)
     
           // Check convergence
    -      val oldCoefficients = oldModel.coefficients
    -      val coefficients = model.coefficients
    -      BLAS.axpy(-1.0, coefficients, oldCoefficients)
    -      val maxTolOfCoefficients = oldCoefficients.toArray.reduce { (x, y) =>
    -        math.max(math.abs(x), math.abs(y))
    +      val oldCoefficients = oldModel.coefficients.toArray :+ oldModel.intercept
    +      val coefficients = model.coefficients.toArray :+ model.intercept
    +      val maxTol = oldCoefficients.zip(coefficients).map(x => x._1 - x._2).reduce {
    --- End diff --
    
    Is there some way to verify that this change does not impact execution time?  The BLAS.axpy operation on coefficients/oldcoefficients means that only oldcoefficients needs to be converted to an array, and the BLAS.axpy operation is probably faster than the map with subtraction, but I could be wrong.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @srowen would you please take a look and merge this if all is good? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Added a few nit-pick comments, otherwise the changes LGTM!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72148/testReport)** for PR 16740 at commit [`69d9631`](https://github.com/apache/spark/commit/69d96319abaa7f4aa98c04ce32556a2cb2c065d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99407439
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -335,6 +335,11 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
           throw new SparkException(msg)
         }
     
    +    if (numFeatures == 0 && !$(fitIntercept)) {
    +      val msg = "Specified model is empty with neither intercept nor feature."
    +      throw new SparkException(msg)
    --- End diff --
    
    I think `require`, which throws `IllegalArgumentException` is more appropriate here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72559/testReport)** for PR 16740 at commit [`5a0ff27`](https://github.com/apache/spark/commit/5a0ff273ea4e22933a94b33b6d8dc4eb9c8148e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @sethah @imatiach-msft Could you take another look and let me know if there are any additional changes needed on this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72487/testReport)** for PR 16740 at commit [`37c41aa`](https://github.com/apache/spark/commit/37c41aa598bd0c2e3cf9e42f217233498ffdac23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72185/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16740


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @seth @imatiach-msft Let me know if there is any other changes needed. Thanks much for your review! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Allowing offset will only require a small change to the intercept calculation, won't it?
    
    ````scala
     val agg = data.agg(sum(w * (col("label") - col("label"))), sum(w)).first()
    link.link(agg.getDouble(0) / agg.getDouble(1))
    ````
    I'm still in favor of fixing the IRLS bug, but we should be able to return an analytic result without too much trouble, unless I'm missing something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99269006
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +743,55 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    +
    +    var idx = 0
    +    for (useWeight <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression().setFamily("poisson")
    +        .setLinkPredictionCol("linkPrediction")
    +      if (useWeight) trainer.setWeightCol("weight")
    +      val model = trainer.fit(dataset)
    +      val actual = model.intercept
    +      assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: intercept only GLM with " +
    +        s"useWeight = $useWeight.")
    +      assert(model.coefficients === new DenseVector(Array.empty[Double]))
    +
    +      val familyLink = FamilyAndLink(trainer)
    +      model.transform(dataset).select("features", "prediction", "linkPrediction").collect()
    +        .foreach {
    +          case Row(features: DenseVector, prediction1: Double, linkPrediction1: Double) =>
    +            val eta = BLAS.dot(features, model.coefficients) + model.intercept
    +            val prediction2 = familyLink.fitted(eta)
    --- End diff --
    
    @sethah Agree. Removed this. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72161/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Can one of the admins merge this PR since we have two approvals now? Thanks. 
    
    @srowen @jkbradley @felixcheung @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r98493237
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala ---
    @@ -86,13 +86,11 @@ private[ml] class IterativelyReweightedLeastSquares(
             standardizeFeatures = false, standardizeLabel = false).fit(newInstances)
     
           // Check convergence
    -      val oldCoefficients = oldModel.coefficients
    -      val coefficients = model.coefficients
    -      BLAS.axpy(-1.0, coefficients, oldCoefficients)
    -      val maxTolOfCoefficients = oldCoefficients.toArray.reduce { (x, y) =>
    -        math.max(math.abs(x), math.abs(y))
    +      val oldCoefficients = oldModel.coefficients.toArray :+ oldModel.intercept
    +      val coefficients = model.coefficients.toArray :+ model.intercept
    +      val maxTol = oldCoefficients.zip(coefficients).map(x => x._1 - x._2).reduce {
    --- End diff --
    
    @imatiach-msft The performance change is minor since the coefficients are not high-dimensional and these are simple operations and don't consume much time than the actual model fitting.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Since we already compute the number of features in the train method, why don't we just check if `numFeatures == 0` and then just compute the intercept as the link of the weighted average of the labels and return it. Then we don't have to run IRLS which doesn't seem to converge in a single iteration anyway. We have typically logged a warning in cases like these, here we might say that the model will be equivalent to the null model. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99276472
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +744,48 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    +
    +    var idx = 0
    +    for (useWeight <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression().setFamily("poisson")
    +      if (useWeight) trainer.setWeightCol("weight")
    +      val model = trainer.fit(dataset)
    +      val actual = model.intercept
    +      assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: intercept only GLM with " +
    +        s"useWeight = $useWeight.")
    +      assert(model.coefficients === new DenseVector(Array.empty[Double]))
    +
    +      idx += 1
    +    }
    +
    +    // throw exception for empty model
    +    val trainer = new GeneralizedLinearRegression().setFitIntercept(false)
    +    intercept[SparkException] {
    --- End diff --
    
    thank you for adding the test, could you also please wrap it in withClue to verify the message contents, eg:
    withClue("Specified model is empty with neither intercept nor feature") {
    intercept[SparkException] { 
    trainer.fit(dataset) 
    } 
    }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72148/testReport)** for PR 16740 at commit [`69d9631`](https://github.com/apache/spark/commit/69d96319abaa7f4aa98c04ce32556a2cb2c065d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72559/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99276315
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -335,6 +335,11 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
           throw new SparkException(msg)
         }
     
    +    if (numFeatures == 0 && !$(fitIntercept)) {
    +      val msg = "Specified model is empty with neither intercept nor feature."
    +      throw new SparkException(msg)
    --- End diff --
    
    @imatiach-msft Test added. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @sethah Yes, we can directly compute the intercept easily. But I'm concerned that such special handling may not integrate well with other features or future changes. For example, we will need to compute the standard error analytically as well, which is not difficult. But the point is that every time there is new feature, one would have to modify the intercept calculation part to handle it. This does not seem efficient. Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @sethah Thanks for your review. Yes, using `foldLeft` would be the simplest fix. I have included both your suggested changes in the new commit. 
    
    Yes, we could handle the special case of the intercept only model outside IWLS. But I think it would be better to use the current fix: a) it allows IWLS to handle edge case, b) the fix is general and less likely to be affected by future structure changes of GLM (e.g., allowing offset). Does this make sense?   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99408646
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +744,50 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    +
    +    var idx = 0
    +    for (useWeight <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression().setFamily("poisson")
    --- End diff --
    
    I'm still in favor of checking other families. We had been discussion using a formula which ended up working with some families but not others - testing all the families would have caught this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400] Allow GLM to handle intercept only model

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    The following is a simple example to illustrate the issue. 
    
    ```
    val dataset = Seq(
              (1.0, 1.0, 2.0, 0.0, 5.0),
              (0.5, 2.0, 1.0, 1.0, 2.0),
              (1.0, 3.0, 0.5, 2.0, 1.0),
              (2.0, 4.0, 1.5, 3.0, 3.0)
            ).toDF("y", "w", "off", "x1", "x2")
    
    val formula = new RFormula().setFormula("y ~ 1")
    val output = formula.fit(dataset).transform(dataset)
    val glr = new GeneralizedLinearRegression().setFamily("poisson")
    val model = glr.fit(output)
    ```
    
    The above prints out the following error message:
    ```
    java.lang.UnsupportedOperationException: empty.reduceLeft
      at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:180)
      at scala.collection.mutable.ArrayOps$ofDouble.scala$collection$IndexedSeqOptimized$$super$reduceLeft(ArrayOps.scala:270)
      at scala.collection.IndexedSeqOptimized$class.reduceLeft(IndexedSeqOptimized.scala:74)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72296/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99277921
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +744,48 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    +
    +    var idx = 0
    +    for (useWeight <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression().setFamily("poisson")
    +      if (useWeight) trainer.setWeightCol("weight")
    +      val model = trainer.fit(dataset)
    +      val actual = model.intercept
    +      assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: intercept only GLM with " +
    +        s"useWeight = $useWeight.")
    +      assert(model.coefficients === new DenseVector(Array.empty[Double]))
    +
    +      idx += 1
    +    }
    +
    +    // throw exception for empty model
    +    val trainer = new GeneralizedLinearRegression().setFitIntercept(false)
    +    intercept[SparkException] {
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r98486973
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -743,6 +743,54 @@ class GeneralizedLinearRegressionSuite
         }
       }
     
    +  test("generalized linear regression: intercept only") {
    +    /*
    +      R code:
    +      y <- c(17, 19, 23, 29)
    +      w <- c(1, 2, 3, 4)
    +      model1 <- glm(y ~ 1, family = poisson)
    +      model2 <- glm(y ~ 1, family = poisson, weights = w)
    +      as.vector(c(coef(model1), coef(model2)))
    +      [1] 3.091042 3.178054
    +     */
    +
    +    val dataset = Seq(
    +      Instance(17.0, 1.0, Vectors.zeros(0)),
    +      Instance(19.0, 2.0, Vectors.zeros(0)),
    +      Instance(23.0, 3.0, Vectors.zeros(0)),
    +      Instance(29.0, 4.0, Vectors.zeros(0))
    +    ).toDF()
    +
    +    val expected = Seq(3.091, 3.178)
    +
    +    import GeneralizedLinearRegression._
    --- End diff --
    
    it seems weird to me that several tests do this, import GeneralizedLinearRegression._, and we don't just have one at the top, but it looks like it is unrelated to your changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r99273642
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -335,6 +335,11 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
           throw new SparkException(msg)
         }
     
    +    if (numFeatures == 0 && !$(fitIntercept)) {
    +      val msg = "Specified model is empty with neither intercept nor feature."
    +      throw new SparkException(msg)
    --- End diff --
    
    suggestion: please add a test to validate this case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72296/testReport)** for PR 16740 at commit [`3a0a2af`](https://github.com/apache/spark/commit/3a0a2aff5a7b09cb0e1db7ec2e756e55b561eace).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72298/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72161/testReport)** for PR 16740 at commit [`6e0b6bf`](https://github.com/apache/spark/commit/6e0b6bf26ff3a0599ffa8aadf0f5d2e55f01b82c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72487/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72299/testReport)** for PR 16740 at commit [`b57af08`](https://github.com/apache/spark/commit/b57af08f792a59438452a3cef070e16ef51316b5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72360/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @jkbradley Thanks much for the review and suggestion. I updated the error message. Please let me know if there's anything else needed for this PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    @sethah Thanks for the comments. 
    
    OK, added more tests to cover all families. It's not possible to test all family and link combination if that's what you mean: the tweedie family supports a family of links. Now, the link tested includes identity, log, inverse, logit and mu^0.4. This should be enough to prevent any non-general changes to accidentally pass the tests. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72360/testReport)** for PR 16740 at commit [`95b7a10`](https://github.com/apache/spark/commit/95b7a1032896dcb65698b2bed355cbd7305a1eaa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Regarding the tests - I don't think the tests should change _depending on_ the implementation. I don't think it's valid to say that we don't need to test this thoroughly because we know that it's just calling IRLS under the hood - especially since someone in the future might come along and change it to, say an analytical approach. Then maybe the non-rigorous tests still pass even though the implementation is wrong. And since we are already aware of a reasonable situation where some families will pass and some will not, I think it's a good idea to test them all. Sorry, not trying to be a pain :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    LGTM
    Merging with master
    Thank you + @sethah for reviewing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r98579939
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala ---
    @@ -86,13 +86,9 @@ private[ml] class IterativelyReweightedLeastSquares(
             standardizeFeatures = false, standardizeLabel = false).fit(newInstances)
     
           // Check convergence
    -      val oldCoefficients = oldModel.coefficients
    -      val coefficients = model.coefficients
    -      BLAS.axpy(-1.0, coefficients, oldCoefficients)
    -      val maxTolOfCoefficients = oldCoefficients.toArray.reduce { (x, y) =>
    --- End diff --
    
    You can just change `reduce` to `foldLeft` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16740#discussion_r98415092
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala ---
    @@ -86,13 +86,11 @@ private[ml] class IterativelyReweightedLeastSquares(
             standardizeFeatures = false, standardizeLabel = false).fit(newInstances)
     
           // Check convergence
    -      val oldCoefficients = oldModel.coefficients
    -      val coefficients = model.coefficients
    -      BLAS.axpy(-1.0, coefficients, oldCoefficients)
    -      val maxTolOfCoefficients = oldCoefficients.toArray.reduce { (x, y) =>
    -        math.max(math.abs(x), math.abs(y))
    +      val oldCoefficients = oldModel.coefficients.toArray :+ oldModel.intercept
    +      val coefficients = model.coefficients.toArray :+ model.intercept
    +      val maxTol = oldCoefficients.zip(coefficients).map(x => x._1 - x._2).reduce {
    --- End diff --
    
    What about simply `oldCoefficients.zip(coefficients).map(x => math.abs(x._1 - x._2)).max` ? (I think that exists, but if not you get the idea.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16740
  
    **[Test build #72185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72185/testReport)** for PR 16740 at commit [`0b3c085`](https://github.com/apache/spark/commit/0b3c085e6171737065a1ca1f07c60f33f8c3bf3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org