You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by actuaryzhang <gi...@git.apache.org> on 2017/01/18 07:16:43 UTC

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

GitHub user actuaryzhang opened a pull request:

    https://github.com/apache/spark/pull/16630

    [SPARK-19270][ML] Add summary table to GLM summary

    ## What changes were proposed in this pull request?
    
    Add R-like summary table to GLM summary, which includes feature name (if exist), parameter estimate, standard error, t-stat and p-value. This allows scala users to easily gather these commonly used inference results.
    
    @srowen @yanboliang 
    
    ## How was this patch tested?
    New tests. One for testing feature Name, and one for testing the summary Table. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/actuaryzhang/spark glmTable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16630.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16630
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #73439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73439/testReport)** for PR 16630 at commit [`8e1c086`](https://github.com/apache/spark/commit/8e1c086cdc953fbe5e0bf6986b1775d07f7cbc6a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101159105
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -915,6 +917,22 @@ class GeneralizedLinearRegressionSummary private[regression] (
       /** Number of instances in DataFrame predictions. */
       private[regression] lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to "V1", "V2", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureName: Array[String] = {
    +    val featureAttrs = AttributeGroup.fromStructField(
    +      dataset.schema(model.getFeaturesCol)).attributes
    +    if (featureAttrs == None) {
    --- End diff --
    
    @imatiach-msft This makes sense. I now changed the code to mirror the same logic. When attritubes are missing, the default name is set to be the feature name with suffix "_0", "_1" etc. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101160069
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -915,6 +917,22 @@ class GeneralizedLinearRegressionSummary private[regression] (
       /** Number of instances in DataFrame predictions. */
       private[regression] lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to "V1", "V2", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureName: Array[String] = {
    --- End diff --
    
    OK, changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r103004564
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1173,33 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNamesLocal = featureNames
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNamesLocal = featureNamesLocal :+ Intercept
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNamesLocal(i), coefficients(i), coefficientStandardErrors(i),
    +        tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    --- End diff --
    
    @felixcheung Would you elaborate on your concern here? The reason I create the SparkSession is to use convert `Seq` to `DataFrame` using `toDF`. Is there a way we can create data frame without explicitly using spark session? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #78992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78992/testReport)** for PR 16630 at commit [`ce0851a`](https://github.com/apache/spark/commit/ce0851afc2e24d23bf8f3b8a2fa3cfdd0f99c6f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79766/testReport)** for PR 16630 at commit [`174fc49`](https://github.com/apache/spark/commit/174fc49142f2915c46fc53df4cb024d2e97cc6ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r100931568
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1170,32 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNames = featureName
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNames = featureNames :+ "Intercept"
    --- End diff --
    
    would it be possible to move this to a constant ("Intercept")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71998/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73073/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Thanks for the updates, the changes look good to me.  One question, out of scope of the specific changes in this review: are there any other summary statistics that we could add in the future?  Maybe R^2 and adjusted R^2?  Also, do you know of any good reference papers that have an overview of the most popular summary statistics used in GLM (not including the ones in this pull request)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101362237
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -1104,6 +1103,83 @@ class GeneralizedLinearRegressionSuite
           .fit(datasetGaussianIdentity.as[LabeledPoint])
       }
     
    +
    +  test("glm summary: feature name") {
    +    // dataset1 with no attribute
    +    val dataset1 = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    // dataset2 with attribute
    +    val datasetTmp = Seq(
    +      (2.0, 1.0, 0.0, 5.0),
    +      (8.0, 2.0, 1.0, 7.0),
    +      (3.0, 3.0, 2.0, 11.0),
    +      (9.0, 4.0, 3.0, 13.0),
    +      (2.0, 5.0, 2.0, 3.0)
    +    ).toDF("y", "w", "x1", "x2")
    +    val formula = new RFormula().setFormula("y ~ x1 + x2")
    +    val dataset2 = formula.fit(datasetTmp).transform(datasetTmp)
    +
    +    val expectedFeature = Seq(Array("features_0", "features_1"), Array("x1", "x2"))
    +
    +    var idx = 0
    +    for (dataset <- Seq(dataset1, dataset2)) {
    +      val model = new GeneralizedLinearRegression().fit(dataset)
    +      model.summary.featureNames.zip(expectedFeature(idx))
    +        .foreach{ x => assert(x._1 === x._2) }
    +      idx += 1
    +    }
    +  }
    +
    +  test("glm summary: summaryTable") {
    +    val dataset = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    val expectedFeature = Seq(Array("features_0", "features_1"),
    +      Array("(Intercept)", "features_0", "features_1"))
    +    val expectedEstimate = Seq(Vectors.dense(0.2884, 0.538),
    +      Vectors.dense(0.7903, 0.2258, 0.4677))
    +    val expectedStdError = Seq(Vectors.dense(1.724, 0.3787),
    +      Vectors.dense(4.0129, 2.1153, 0.5815))
    +    val expectedTValue = Seq(Vectors.dense(0.1673, 1.4205),
    +      Vectors.dense(0.1969, 0.1067, 0.8043))
    +    val expectedPValue = Seq(Vectors.dense(0.8778, 0.2506),
    +      Vectors.dense(0.8621, 0.9247, 0.5056))
    +
    +    var idx = 0
    +    for (fitIntercept <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression()
    +        .setFamily("gaussian")
    +        .setFitIntercept(fitIntercept)
    +      val model = trainer.fit(dataset)
    +      val summaryTable = model.summary.summaryTable
    +
    +      summaryTable.select("Feature").collect.map(_.getString(0))
    +        .zip(expectedFeature(idx)).foreach{ x => assert(x._1 === x._2,
    +        "Feature name mismatch in summaryTable") }
    +      assert(Vectors.dense(summaryTable.select("Coefficient").rdd.collect.map(_.getDouble(0)))
    +        ~== expectedEstimate(idx) absTol 1E-3, "Coefficient mismatch in summaryTable")
    +      assert(Vectors.dense(summaryTable.select("StdError").rdd.collect.map(_.getDouble(0)))
    +        ~== expectedStdError(idx) absTol 1E-3, "Standard error mismatch in summaryTable")
    +      assert(Vectors.dense(summaryTable.select("TValue").rdd.collect.map(_.getDouble(0)))
    --- End diff --
    
    it looks like for all of these below you can just call collect instead of rdd.collect?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r126873706
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1187,6 +1189,23 @@ class GeneralizedLinearRegressionSummary private[regression] (
       @Since("2.2.0")
       lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to feature column name with numbered suffix "_0", "_1", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureNames: Array[String] = {
    --- End diff --
    
    Could we keep it private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79688/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r127844484
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1441,4 +1460,33 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    --- End diff --
    
    Updated it as `coefficientMatrix`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r100932561
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1170,32 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNames = featureName
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNames = featureNames :+ "Intercept"
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNames(i), coefficients(i), coefficientStandardErrors(i), tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    +      import spark.implicits._
    +      result.toDF("Feature", "Estimate", "StdError", "TValue", "PValue").repartition(1)
    +    } else {
    +      throw new UnsupportedOperationException(
    +        "No summary table available for this GeneralizedLinearRegressionModel")
    --- End diff --
    
    minor suggestion: it would be nice to add a test to verify this exception is thrown (and with the right error message using the withClue() check)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101357001
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -915,6 +919,23 @@ class GeneralizedLinearRegressionSummary private[regression] (
       /** Number of instances in DataFrame predictions. */
       private[regression] lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to feature column name with numbered suffix "_0", "_1", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureNames: Array[String] = {
    +    val featureAttrs = AttributeGroup.fromStructField(
    +      dataset.schema(model.getFeaturesCol)).attributes
    +    if (featureAttrs == None) {
    +      Array.tabulate[String](origModel.numFeatures)(
    +        (x: Int) => (model.getFeaturesCol + "_" + x))
    --- End diff --
    
    in general I would have preferred to create a platform-level function (or use one if it exists) to format the strings in the same way, so there is no duplicate code in VectorAssembler vs here that can diverge (and which other functions in spark can generally use).  However, this seems a bit out of scope of this code review, so I don't think you need to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r127853762
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -452,6 +452,8 @@ object GeneralizedLinearRegression extends DefaultParamsReadable[GeneralizedLine
     
       private[regression] val epsilon: Double = 1E-16
     
    +  private[regression] val Intercept: String = "(Intercept)"
    --- End diff --
    
    Removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73060/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #72901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72901/testReport)** for PR 16630 at commit [`b67d3fd`](https://github.com/apache/spark/commit/b67d3fdc93a0a398dcac0271b501d054082ca793).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r102564640
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -34,6 +35,7 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.{Column, DataFrame, Dataset, Row}
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.types.{DataType, DoubleType, StructType}
    +import org.apache.spark.sql.SparkSession
    --- End diff --
    
    we generally try to sort the import...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79685/testReport)** for PR 16630 at commit [`a16cbee`](https://github.com/apache/spark/commit/a16cbee4e86cf044a90015bdd6900b9a22116200).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r100933207
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1170,32 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNames = featureName
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNames = featureNames :+ "Intercept"
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNames(i), coefficients(i), coefficientStandardErrors(i), tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    +      import spark.implicits._
    +      result.toDF("Feature", "Estimate", "StdError", "TValue", "PValue").repartition(1)
    --- End diff --
    
    question: is "Estimate" the better term to use here as opposed to "Coefficient"?  Are there other libraries which use this specific term in this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #73060 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73060/testReport)** for PR 16630 at commit [`fd9f1be`](https://github.com/apache/spark/commit/fd9f1becd40fab890f4f7542f66dddc79aecd8a0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @actuaryzhang sorry, can you comment on this question I had above:
    One question, out of scope of the specific changes in this review: are there any other summary statistics that we could add in the future? Maybe R^2 and adjusted R^2? Also, do you know of any good reference papers that have an overview of the most popular summary statistics used in GLM (not including the ones in this pull request)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @yanboliang Could you take a look? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #71998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71998/testReport)** for PR 16630 at commit [`78bb77f`](https://github.com/apache/spark/commit/78bb77f6d2616888c07bf625341a7718ff79723c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r100927396
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1170,32 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNames = featureName
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNames = featureNames :+ "Intercept"
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNames(i), coefficients(i), coefficientStandardErrors(i), tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    +      import spark.implicits._
    --- End diff --
    
    minor comment: might it be better to simplify this as "import dataset.sparkSession.implicits._", or is there a reason to prefer the SparkSession.builder().getOrCreate()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101220572
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -915,6 +917,22 @@ class GeneralizedLinearRegressionSummary private[regression] (
       /** Number of instances in DataFrame predictions. */
       private[regression] lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to "V1", "V2", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureName: Array[String] = {
    +    val featureAttrs = AttributeGroup.fromStructField(
    +      dataset.schema(model.getFeaturesCol)).attributes
    +    if (featureAttrs == None) {
    +      Array.tabulate[String](origModel.numFeatures)((x: Int) => ("V" + (x + 1)))
    --- End diff --
    
    quite possibly - could you check what would be new or removed with that approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #73073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73073/testReport)** for PR 16630 at commit [`9a441f8`](https://github.com/apache/spark/commit/9a441f8951f0303986547699a6f576273d8180ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r128200757
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1458,4 +1475,167 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Coefficient matrix with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.3.0")
    +  lazy val coefficientMatrix: Array[(String, Double, Double, Double, Double)] = {
    +    if (isNormalSolver) {
    +      var featureNamesLocal = featureNames
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNamesLocal = featureNamesLocal :+ "(Intercept)"
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx) yield
    +        (featureNamesLocal(i), coefficients(i), coefficientStandardErrors(i),
    +        tValues(i), pValues(i))
    +      result
    +    } else {
    +      throw new UnsupportedOperationException(
    +        "No summary table available for this GeneralizedLinearRegressionModel")
    +    }
    +  }
    +
    +  private def round(x: Double, digit: Int): String = {
    +    BigDecimal(x).setScale(digit, BigDecimal.RoundingMode.HALF_UP).toString()
    +  }
    +
    +  private[regression] def showString(_numRows: Int, truncate: Int = 20,
    +                                     numDigits: Int = 3): String = {
    --- End diff --
    
    Align.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r103003515
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -34,6 +35,7 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.{Column, DataFrame, Dataset, Row}
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.types.{DataType, DoubleType, StructType}
    +import org.apache.spark.sql.SparkSession
    --- End diff --
    
    Thanks. Sorted the imports. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #71985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71985/testReport)** for PR 16630 at commit [`6173ba9`](https://github.com/apache/spark/commit/6173ba9834442c53c1740cbd7a47ca6a3312a26a).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    The following code illustrates the idea of this PR. 
    
    ```
    val datasetWithWeight = Seq(
        (1.0, 1.0, 0.0, 5.0),
        (0.5, 2.0, 1.0, 2.0),
        (1.0, 3.0, 2.0, 1.0),
        (0.0, 4.0, 3.0, 3.0)
      ).toDF("y", "w", "x1", "x2")
    
    val formula = (new RFormula()
      .setFormula("y ~ x1 + x2")
      .setFeaturesCol("features")
      .setLabelCol("label"))
    val output = formula.fit(datasetWithWeight).transform(datasetWithWeight)
    
    val glr = new GeneralizedLinearRegression()
    val model = glr.fit(output)
    model.summary.summaryTable.show
    ```
    
    This prints out: 
    ```
    +---------+--------------------+-------------------+-------------------+-------------------+
    |  Feature|            Estimate|           StdError|             TValue|             PValue|
    +---------+--------------------+-------------------+-------------------+-------------------+
    |Intercept|  1.4523809523809539| 0.9245946589975053| 1.5708299180050451| 0.3609009059280113|
    |       x1|-0.33333333333333387|0.28171808490950573|-1.1832159566199243|0.44669962096188565|
    |       x2|-0.11904761904761924|   0.21295885499998|-0.5590169943749482| 0.6754896416955616|
    +---------+--------------------+-------------------+-------------------+-------------------+
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79970/testReport)** for PR 16630 at commit [`7281b77`](https://github.com/apache/spark/commit/7281b77880898f5cb421467ef82e10ad42a17638).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79688/testReport)** for PR 16630 at commit [`57f1e5c`](https://github.com/apache/spark/commit/57f1e5c259d7f237324dd1b3b481b7e82952b53e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79970/testReport)** for PR 16630 at commit [`7281b77`](https://github.com/apache/spark/commit/7281b77880898f5cb421467ef82e10ad42a17638).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r102566039
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1173,33 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNamesLocal = featureNames
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNamesLocal = featureNamesLocal :+ Intercept
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNamesLocal(i), coefficients(i), coefficientStandardErrors(i),
    +        tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    --- End diff --
    
    I'm concerned about this - the current session might not always be the right one. 
    If we need an instance of SparkSession it would be preferable in the way MLReader/BaseReadWrite does:
    https://github.com/apache/spark/blob/04ee8cf633e17b6bf95225a8dd77bf2e06980eb3/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L59



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r100932182
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -1104,6 +1103,83 @@ class GeneralizedLinearRegressionSuite
           .fit(datasetGaussianIdentity.as[LabeledPoint])
       }
     
    +
    +  test("glm summary: feature name") {
    +    // dataset1 with no attribute
    +    val dataset1 = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    // dataset2 with attribute
    +    val datasetTmp = Seq(
    +      (2.0, 1.0, 0.0, 5.0),
    +      (8.0, 2.0, 1.0, 7.0),
    +      (3.0, 3.0, 2.0, 11.0),
    +      (9.0, 4.0, 3.0, 13.0),
    +      (2.0, 5.0, 2.0, 3.0)
    +    ).toDF("y", "w", "x1", "x2")
    +    val formula = new RFormula().setFormula("y ~ x1 + x2")
    +    val dataset2 = formula.fit(datasetTmp).transform(datasetTmp)
    +
    +    val expectedFeature = Seq(Array("V1", "V2"), Array("x1", "x2"))
    +
    +    var idx = 0
    +    for (dataset <- Seq(dataset1, dataset2)) {
    +      val model = new GeneralizedLinearRegression().fit(dataset)
    +      model.summary.featureName.zip(expectedFeature(idx))
    +        .foreach{ x => assert(x._1 === x._2) }
    +      idx += 1
    +    }
    +  }
    +
    +  test("glm summary: summaryTable") {
    +    val dataset = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    val expectedFeature = Seq(Array("V1", "V2"),
    +      Array("Intercept", "V1", "V2"))
    +    val expectedEstimate = Seq(Vectors.dense(0.2884, 0.538),
    +      Vectors.dense(0.7903, 0.2258, 0.4677))
    +    val expectedStdError = Seq(Vectors.dense(1.724, 0.3787),
    +      Vectors.dense(4.0129, 2.1153, 0.5815))
    +    val expectedTValue = Seq(Vectors.dense(0.1673, 1.4205),
    +      Vectors.dense(0.1969, 0.1067, 0.8043))
    +    val expectedPValue = Seq(Vectors.dense(0.8778, 0.2506),
    +      Vectors.dense(0.8621, 0.9247, 0.5056))
    +
    +    var idx = 0
    +    for (fitIntercept <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression()
    +        .setFamily("gaussian")
    --- End diff --
    
    not related to this code review, but it's unfortunate that these aren't constants that can be referenced from the model, it's messy to have to type strings like this everywhere as opposed to referencing variables


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r126871919
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -452,6 +452,8 @@ object GeneralizedLinearRegression extends DefaultParamsReadable[GeneralizedLine
     
       private[regression] val epsilon: Double = 1E-16
     
    +  private[regression] val Intercept: String = "(Intercept)"
    --- End diff --
    
    If this is only used once, it's better to eliminate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @felixcheung Could you take another look at this PR? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101362084
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -1104,6 +1103,83 @@ class GeneralizedLinearRegressionSuite
           .fit(datasetGaussianIdentity.as[LabeledPoint])
       }
     
    +
    +  test("glm summary: feature name") {
    +    // dataset1 with no attribute
    +    val dataset1 = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    // dataset2 with attribute
    +    val datasetTmp = Seq(
    +      (2.0, 1.0, 0.0, 5.0),
    +      (8.0, 2.0, 1.0, 7.0),
    +      (3.0, 3.0, 2.0, 11.0),
    +      (9.0, 4.0, 3.0, 13.0),
    +      (2.0, 5.0, 2.0, 3.0)
    +    ).toDF("y", "w", "x1", "x2")
    +    val formula = new RFormula().setFormula("y ~ x1 + x2")
    +    val dataset2 = formula.fit(datasetTmp).transform(datasetTmp)
    +
    +    val expectedFeature = Seq(Array("features_0", "features_1"), Array("x1", "x2"))
    +
    +    var idx = 0
    +    for (dataset <- Seq(dataset1, dataset2)) {
    +      val model = new GeneralizedLinearRegression().fit(dataset)
    +      model.summary.featureNames.zip(expectedFeature(idx))
    +        .foreach{ x => assert(x._1 === x._2) }
    +      idx += 1
    +    }
    +  }
    +
    +  test("glm summary: summaryTable") {
    +    val dataset = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    val expectedFeature = Seq(Array("features_0", "features_1"),
    +      Array("(Intercept)", "features_0", "features_1"))
    +    val expectedEstimate = Seq(Vectors.dense(0.2884, 0.538),
    --- End diff --
    
    is this comparing the summary to the results of R?  If so, in general you should add the R code in a comment that was used to generate the expected results so that the expected values are reproducible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101220747
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -1104,6 +1103,83 @@ class GeneralizedLinearRegressionSuite
           .fit(datasetGaussianIdentity.as[LabeledPoint])
       }
     
    +
    +  test("glm summary: feature name") {
    +    // dataset1 with no attribute
    +    val dataset1 = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    // dataset2 with attribute
    +    val datasetTmp = Seq(
    +      (2.0, 1.0, 0.0, 5.0),
    +      (8.0, 2.0, 1.0, 7.0),
    +      (3.0, 3.0, 2.0, 11.0),
    +      (9.0, 4.0, 3.0, 13.0),
    +      (2.0, 5.0, 2.0, 3.0)
    +    ).toDF("y", "w", "x1", "x2")
    +    val formula = new RFormula().setFormula("y ~ x1 + x2")
    +    val dataset2 = formula.fit(datasetTmp).transform(datasetTmp)
    +
    +    val expectedFeature = Seq(Array("V1", "V2"), Array("x1", "x2"))
    +
    +    var idx = 0
    +    for (dataset <- Seq(dataset1, dataset2)) {
    +      val model = new GeneralizedLinearRegression().fit(dataset)
    +      model.summary.featureName.zip(expectedFeature(idx))
    +        .foreach{ x => assert(x._1 === x._2) }
    +      idx += 1
    +    }
    +  }
    +
    +  test("glm summary: summaryTable") {
    +    val dataset = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    val expectedFeature = Seq(Array("V1", "V2"),
    +      Array("Intercept", "V1", "V2"))
    +    val expectedEstimate = Seq(Vectors.dense(0.2884, 0.538),
    +      Vectors.dense(0.7903, 0.2258, 0.4677))
    +    val expectedStdError = Seq(Vectors.dense(1.724, 0.3787),
    +      Vectors.dense(4.0129, 2.1153, 0.5815))
    +    val expectedTValue = Seq(Vectors.dense(0.1673, 1.4205),
    +      Vectors.dense(0.1969, 0.1067, 0.8043))
    +    val expectedPValue = Seq(Vectors.dense(0.8778, 0.2506),
    +      Vectors.dense(0.8621, 0.9247, 0.5056))
    +
    +    var idx = 0
    +    for (fitIntercept <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression()
    +        .setFamily("gaussian")
    --- End diff --
    
    Guassian.name.toLowerCase (or Guassian.name since it is converted to lowercase later) would be generally the approach.
    
    but this is test suite, I think it's ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r100931445
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -915,6 +917,22 @@ class GeneralizedLinearRegressionSummary private[regression] (
       /** Number of instances in DataFrame predictions. */
       private[regression] lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to "V1", "V2", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureName: Array[String] = {
    +    val featureAttrs = AttributeGroup.fromStructField(
    +      dataset.schema(model.getFeaturesCol)).attributes
    +    if (featureAttrs == None) {
    --- End diff --
    
    if I do the example below in spark-shell:
    
    import org.apache.spark.ml.feature.HashingTF
    val tf = new HashingTF().setInputCol("x").setOutputCol("hash")
    val df = spark.createDataFrame(Seq(Tuple3(0.0,Array("a", "b"), 4), Tuple3(1.0, Array("b", "c"), 6), Tuple3(1.0, Array("a", "c"), 7), Tuple3(0.0, Array("b","c"), 7))).toDF("y", "x", "z")
    val dfres = tf.transform(df)
    
    when doing show():
    scala> dfres.show
    +---+------+---+--------------------+
    |  y|     x|  z|                hash|
    +---+------+---+--------------------+
    |0.0|[a, b]|  4|(262144,[30913,22...|
    |1.0|[b, c]|  6|(262144,[28698,30...|
    |1.0|[a, c]|  7|(262144,[28698,22...|
    |0.0|[b, c]|  7|(262144,[28698,30...|
    +---+------+---+--------------------+
    
    but, when I look at schema:
    import org.apache.spark.ml.attribute.AttributeGroup
    scala> AttributeGroup.fromStructField(dfres.schema("hash")).attributes
    res5: Option[Array[org.apache.spark.ml.attribute.Attribute]] = None
    
    scala> AttributeGroup.fromStructField(dfres.schema("hash"))
    res6: org.apache.spark.ml.attribute.AttributeGroup = {"ml_attr":{"num_attrs":262144}}
    
    but in this case the name should be of the form: hash_{#}
    instead of V{#}
    for example, when using VectorAssembler on the above:
    import org.apache.spark.ml.feature.VectorAssembler
    val va = new VectorAssembler().setInputCols(Array("y","z","hash")).setOutputCol("outputs")
    scala> va.transform(dfres).show()
    +---+------+---+--------------------+--------------------+
    |  y|     x|  z|                hash|             outputs|
    +---+------+---+--------------------+--------------------+
    |0.0|[a, b]|  4|(262144,[30913,22...|(262146,[1,30915,...|
    |1.0|[b, c]|  6|(262144,[28698,30...|(262146,[0,1,2870...|
    |1.0|[a, c]|  7|(262144,[28698,22...|(262146,[0,1,2870...|
    |0.0|[b, c]|  7|(262144,[28698,30...|(262146,[1,28700,...|
    +---+------+---+--------------------+--------------------+
    
    scala> print(AttributeGroup.fromStructField(va.transform(dfres).schema("outputs")).attributes.get)
    [Lorg.apache.spark.ml.attribute.Attribute;@4416197b
    scala> AttributeGroup.fromStructField(va.transform(dfres).schema("outputs")).attributes.get
    res22: Array[org.apache.spark.ml.attribute.Attribute] = Array({"type":"numeric","idx":0,"name":"y"}, {"type":"numeric","idx":1,"name":"z"}, {"type":"numeric","idx":2,"name":"hash_0"}, {"type":"numeric","idx":3,"name":"hash_1"}, {"type":"numeric","idx":4,"name":"hash_2"}, {"type":"numeric","idx":5,"name":"hash_3"}, {"type":"numeric","idx":6,"name":"hash_4"}, {"type":"numeric","idx":7,"name":"hash_5"}, {"type":"numeric","idx":8,"name":"hash_6"}, {"type":"numeric","idx":9,"name":"hash_7"}, {"type":"numeric","idx":10,"name":"hash_8"}, {"type":"numeric","idx":11,"name":"hash_9"}, {"type":"numeric","idx":12,"name":"hash_10"}, {"type":"numeric","idx":13,"name":"hash_11"}, {"type":"numeric","idx":14,"name":"hash_12"}, {"type":"numeric","idx":15,"name":"hash_13"}, {"type":"numeric","idx":16,"nam...
    
    you can see that the attributes are given the column name followed by the index.
    This seems like a bug in the VectorAssembler, because it is making the schema dense when it should be sparse, but regardless this seems to be the more official way to represent the name of the attributes instead of using a "V" followed by index - unless you have seen the "V" + index used elsewhere?
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79688/testReport)** for PR 16630 at commit [`57f1e5c`](https://github.com/apache/spark/commit/57f1e5c259d7f237324dd1b3b481b7e82952b53e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79686/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79686/testReport)** for PR 16630 at commit [`640d564`](https://github.com/apache/spark/commit/640d56442e6f5d1a14b4a0cb895d6da713b003fd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @actuaryzhang sorry I'm at Spark Summit East, will take a look soon.  For the feature name or "lazy val featureName: Array[String]", I recall there is a sparse (eg output by HashingTF) and dense version of the metadata for the StructField, I need to look into that code a bit more to understand if it works...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101822942
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -1104,6 +1103,83 @@ class GeneralizedLinearRegressionSuite
           .fit(datasetGaussianIdentity.as[LabeledPoint])
       }
     
    +
    +  test("glm summary: feature name") {
    +    // dataset1 with no attribute
    +    val dataset1 = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    // dataset2 with attribute
    +    val datasetTmp = Seq(
    +      (2.0, 1.0, 0.0, 5.0),
    +      (8.0, 2.0, 1.0, 7.0),
    +      (3.0, 3.0, 2.0, 11.0),
    +      (9.0, 4.0, 3.0, 13.0),
    +      (2.0, 5.0, 2.0, 3.0)
    +    ).toDF("y", "w", "x1", "x2")
    +    val formula = new RFormula().setFormula("y ~ x1 + x2")
    +    val dataset2 = formula.fit(datasetTmp).transform(datasetTmp)
    +
    +    val expectedFeature = Seq(Array("features_0", "features_1"), Array("x1", "x2"))
    +
    +    var idx = 0
    +    for (dataset <- Seq(dataset1, dataset2)) {
    +      val model = new GeneralizedLinearRegression().fit(dataset)
    +      model.summary.featureNames.zip(expectedFeature(idx))
    +        .foreach{ x => assert(x._1 === x._2) }
    +      idx += 1
    +    }
    +  }
    +
    +  test("glm summary: summaryTable") {
    +    val dataset = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    val expectedFeature = Seq(Array("features_0", "features_1"),
    +      Array("(Intercept)", "features_0", "features_1"))
    +    val expectedEstimate = Seq(Vectors.dense(0.2884, 0.538),
    --- End diff --
    
    Thanks. Added in R code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r102564507
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GeneralizedLinearRegressionWrapper.scala ---
    @@ -99,37 +95,23 @@ private[r] object GeneralizedLinearRegressionWrapper
         val summary = glm.summary
     
         val rFeatures: Array[String] = if (glm.getFitIntercept) {
    -      Array("(Intercept)") ++ features
    +      Array("(Intercept)") ++ summary.featureNames
         } else {
    -      features
    +      summary.featureNames
         }
     
         val rCoefficients: Array[Double] = if (summary.isNormalSolver) {
    -      val rCoefficientStandardErrors = if (glm.getFitIntercept) {
    -        Array(summary.coefficientStandardErrors.last) ++
    -          summary.coefficientStandardErrors.dropRight(1)
    -      } else {
    -        summary.coefficientStandardErrors
    -      }
    +      val rCoefficientStandardErrors =
    +        summary.summaryTable.select("StdError").collect.map(_.getDouble(0))
     
    -      val rTValues = if (glm.getFitIntercept) {
    -        Array(summary.tValues.last) ++ summary.tValues.dropRight(1)
    -      } else {
    -        summary.tValues
    -      }
    +      val rTValues =
    +        summary.summaryTable.select("TValue").collect.map(_.getDouble(0))
     
    -      val rPValues = if (glm.getFitIntercept) {
    -        Array(summary.pValues.last) ++ summary.pValues.dropRight(1)
    -      } else {
    -        summary.pValues
    -      }
    +      val rPValues =
    +        summary.summaryTable.select("PValue").collect.map(_.getDouble(0))
     
    -      if (glm.getFitIntercept) {
    -        Array(glm.intercept) ++ glm.coefficients.toArray ++
    -          rCoefficientStandardErrors ++ rTValues ++ rPValues
    -      } else {
    -        glm.coefficients.toArray ++ rCoefficientStandardErrors ++ rTValues ++ rPValues
    -      }
    +      summary.summaryTable.select("Coefficient").collect.map(_.getDouble(0)) ++
    +        rCoefficientStandardErrors ++ rTValues ++ rPValues
    --- End diff --
    
    Could you run a quick check to see if the values in the SparkR summary are the same before and after this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    the code looks very good, I added a few minor comments, will take another look tomorrow, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r128200260
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1458,4 +1475,167 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Coefficient matrix with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.3.0")
    +  lazy val coefficientMatrix: Array[(String, Double, Double, Double, Double)] = {
    +    if (isNormalSolver) {
    +      var featureNamesLocal = featureNames
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNamesLocal = featureNamesLocal :+ "(Intercept)"
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx) yield
    +        (featureNamesLocal(i), coefficients(i), coefficientStandardErrors(i),
    +        tValues(i), pValues(i))
    +      result
    +    } else {
    +      throw new UnsupportedOperationException(
    +        "No summary table available for this GeneralizedLinearRegressionModel")
    +    }
    +  }
    +
    +  private def round(x: Double, digit: Int): String = {
    +    BigDecimal(x).setScale(digit, BigDecimal.RoundingMode.HALF_UP).toString()
    +  }
    +
    +  private[regression] def showString(_numRows: Int, truncate: Int = 20,
    +                                     numDigits: Int = 3): String = {
    +    val numRows = _numRows.max(1)
    +    val data = coefficientMatrix.take(numRows)
    +    val hasMoreData = coefficientMatrix.size > numRows
    +
    +    val colNames = Array("Feature", "Estimate", "StdError", "TValue", "PValue")
    +    val numCols = colNames.size
    +
    +    val rows = colNames +: data.map( row => {
    +      val mrow = for (cell <- row.productIterator) yield {
    +        val str = cell match {
    +          case s: String => s
    +          case n: Double => round(n, numDigits).toString
    +        }
    +        if (truncate > 0 && str.length > truncate) {
    +          // do not show ellipses for strings shorter than 4 characters.
    +          if (truncate < 4) str.substring(0, truncate)
    +          else str.substring(0, truncate - 3) + "..."
    +        } else {
    +          str
    +        }
    +      }
    +      mrow.toArray
    +    })
    +
    +    val sb = new StringBuilder
    +    val colWidths = Array.fill(numCols)(3)
    +
    +    // Compute the width of each column
    +    for (row <- rows) {
    +      for ((cell, i) <- row.zipWithIndex) {
    +        colWidths(i) = math.max(colWidths(i), cell.length)
    +      }
    +    }
    +
    +    // Create SeparateLine
    +    val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString()
    +
    +    // column names
    +    rows.head.zipWithIndex.map { case (cell, i) =>
    +      if (truncate > 0) {
    +        StringUtils.leftPad(cell, colWidths(i))
    +      } else {
    +        StringUtils.rightPad(cell, colWidths(i))
    +      }
    +    }.addString(sb, "|", "|", "|\n")
    +    sb.append(sep)
    +
    +    // data
    +    rows.tail.map {
    +      _.zipWithIndex.map { case (cell, i) =>
    +        if (truncate > 0) {
    +          StringUtils.leftPad(cell.toString, colWidths(i))
    +        } else {
    +          StringUtils.rightPad(cell.toString, colWidths(i))
    +        }
    +      }.addString(sb, "|", "|", "|\n")
    +    }
    +
    +    // For Data that has more than "numRows" records
    +    if (hasMoreData) {
    +      sb.append("...\n")
    +      sb.append(sep)
    +      val rowsString = if (numRows == 1) "row" else "rows"
    +      sb.append(s"only showing top $numRows $rowsString\n")
    +    } else {
    +      sb.append(sep)
    +    }
    +
    +    sb.append("\n")
    +    sb.append(s"(Dispersion parameter for ${family.name} family taken to be " +
    +      round(dispersion, numDigits) + ")")
    +
    +    sb.append("\n")
    +    val nd = "Null deviance: " + round(nullDeviance, numDigits) +
    +      s" on $degreesOfFreedom degrees of freedom"
    +    val rd = "Residual deviance: " + round(deviance, numDigits) +
    +      s" on $residualDegreeOfFreedom degrees of freedom"
    +    val l = math.max(nd.length, rd.length)
    +    sb.append(StringUtils.leftPad(nd, l))
    +    sb.append("\n")
    +    sb.append(StringUtils.leftPad(rd, l))
    +
    +    if (family.name != "tweedie") {
    +      sb.append("\n")
    +      sb.append(s"AIC: " + round(aic, numDigits))
    +    }
    +
    +    sb.toString()
    +  }
    +
    +  /**
    +   * Displays the summary of a GeneralizedLinearModel fit.
    +   *
    +   * @since 2.3.0
    +   */
    +  def show(): Unit = {
    +    val numRows = coefficientMatrix.size
    +    show(numRows, true, 3)
    +  }
    +
    +  /**
    +   * Displays the top numRows rows of the summary of a GeneralizedLinearModel fit.
    +   *
    +   * @param numRows Number of rows to show
    +   *
    +   * @since 2.3.0
    +   */
    +  @Since("2.3.0")
    +  def show(numRows: Int): Unit = {
    +    show(numRows, true, 3)
    +  }
    +
    +  /**
    +   * Displays the summary of a GeneralizedLinearModel fit. Strings more than 20 characters
    +   * will be truncated, and all cells will be aligned right.
    +   *
    +   * @param numRows Number of rows to show
    +   * @param truncate Whether truncate long strings. If true, strings more than 20 characters will
    +   *              be truncated and all cells will be aligned right
    +   * @param numDigits Number of decimal places used to round numerical values.
    +   *
    +   * @since 2.3.0
    +   */
    +  // scalastyle:off println
    +  def show(numRows: Int, truncate: Boolean, numDigits: Int): Unit = if (truncate) {
    --- End diff --
    
    I think not all functions are useful for GLM summary, I'd recommend to keep only one ```show``` function with default setting, such as ```numRows = coefficientMatrix.size```, ```truncate = 20``` and ```numDigits = 3```.  There has little different compared with ```Dataset.show```, it's not necessary to provide lots of opinions for users to set, users just want to see the output like R. Then the code will be more clean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101160217
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1170,32 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNames = featureName
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNames = featureNames :+ "Intercept"
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNames(i), coefficients(i), coefficientStandardErrors(i), tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    +      import spark.implicits._
    +      result.toDF("Feature", "Estimate", "StdError", "TValue", "PValue").repartition(1)
    --- End diff --
    
    R was using 'Estimate'. I changed it to 'Coefficient' now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #73073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73073/testReport)** for PR 16630 at commit [`9a441f8`](https://github.com/apache/spark/commit/9a441f8951f0303986547699a6f576273d8180ff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r126872077
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1441,4 +1460,33 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    --- End diff --
    
    Bump up since version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r126873611
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1441,4 +1460,33 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNamesLocal = featureNames
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNamesLocal = featureNamesLocal :+ Intercept
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNamesLocal(i), coefficients(i), coefficientStandardErrors(i),
    +        tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    +      import spark.implicits._
    +      result.toDF("Feature", "Coefficient", "StdError", "TValue", "PValue").repartition(1)
    --- End diff --
    
    Could you let me know the reason of wrapping the result as a ```DataFrame```? I think a local 2D array is enough. ```DataFrame``` adds some extra cost and actually it will collect as local array when you call ```toString```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101159146
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1170,32 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNames = featureName
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNames = featureNames :+ "Intercept"
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Made a new commit to address the comments. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71985/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72901/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Note the output format of GLR ```summary.toString``` is:
    ```
    Coefficients:
       Feature Estimate Std Error    T Value P Value
    features_0  2.21304   0.00279  792.03163 0.00000
    features_1  0.83096   0.00080 1042.07543 0.00000
    
    (Dispersion parameter for gaussian family taken to be 0.06483)
    Null deviance: 2344915.50893 on 9998 degrees of freedom
    Residual deviance: 648.20325 on 9998 degrees of freedom
    AIC: 1023.40993
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    This test just resists to start. Could someone help? Many thanks!
    @srowen @jkbradley @MLnick @yanboliang 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    I think this is helpful to have - added a few comments.
    Any more feedback @jkbradley @yanboliang @srowen @MLnick?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101822971
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -1104,6 +1103,83 @@ class GeneralizedLinearRegressionSuite
           .fit(datasetGaussianIdentity.as[LabeledPoint])
       }
     
    +
    +  test("glm summary: feature name") {
    +    // dataset1 with no attribute
    +    val dataset1 = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    // dataset2 with attribute
    +    val datasetTmp = Seq(
    +      (2.0, 1.0, 0.0, 5.0),
    +      (8.0, 2.0, 1.0, 7.0),
    +      (3.0, 3.0, 2.0, 11.0),
    +      (9.0, 4.0, 3.0, 13.0),
    +      (2.0, 5.0, 2.0, 3.0)
    +    ).toDF("y", "w", "x1", "x2")
    +    val formula = new RFormula().setFormula("y ~ x1 + x2")
    +    val dataset2 = formula.fit(datasetTmp).transform(datasetTmp)
    +
    +    val expectedFeature = Seq(Array("features_0", "features_1"), Array("x1", "x2"))
    +
    +    var idx = 0
    +    for (dataset <- Seq(dataset1, dataset2)) {
    +      val model = new GeneralizedLinearRegression().fit(dataset)
    +      model.summary.featureNames.zip(expectedFeature(idx))
    +        .foreach{ x => assert(x._1 === x._2) }
    +      idx += 1
    +    }
    +  }
    +
    +  test("glm summary: summaryTable") {
    +    val dataset = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    val expectedFeature = Seq(Array("features_0", "features_1"),
    +      Array("(Intercept)", "features_0", "features_1"))
    +    val expectedEstimate = Seq(Vectors.dense(0.2884, 0.538),
    +      Vectors.dense(0.7903, 0.2258, 0.4677))
    +    val expectedStdError = Seq(Vectors.dense(1.724, 0.3787),
    +      Vectors.dense(4.0129, 2.1153, 0.5815))
    +    val expectedTValue = Seq(Vectors.dense(0.1673, 1.4205),
    +      Vectors.dense(0.1969, 0.1067, 0.8043))
    +    val expectedPValue = Seq(Vectors.dense(0.8778, 0.2506),
    +      Vectors.dense(0.8621, 0.9247, 0.5056))
    +
    +    var idx = 0
    +    for (fitIntercept <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression()
    +        .setFamily("gaussian")
    +        .setFitIntercept(fitIntercept)
    +      val model = trainer.fit(dataset)
    +      val summaryTable = model.summary.summaryTable
    +
    +      summaryTable.select("Feature").collect.map(_.getString(0))
    +        .zip(expectedFeature(idx)).foreach{ x => assert(x._1 === x._2,
    +        "Feature name mismatch in summaryTable") }
    +      assert(Vectors.dense(summaryTable.select("Coefficient").rdd.collect.map(_.getDouble(0)))
    +        ~== expectedEstimate(idx) absTol 1E-3, "Coefficient mismatch in summaryTable")
    +      assert(Vectors.dense(summaryTable.select("StdError").rdd.collect.map(_.getDouble(0)))
    +        ~== expectedStdError(idx) absTol 1E-3, "Standard error mismatch in summaryTable")
    +      assert(Vectors.dense(summaryTable.select("TValue").rdd.collect.map(_.getDouble(0)))
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79766/testReport)** for PR 16630 at commit [`174fc49`](https://github.com/apache/spark/commit/174fc49142f2915c46fc53df4cb024d2e97cc6ca).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101159255
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -1104,6 +1103,83 @@ class GeneralizedLinearRegressionSuite
           .fit(datasetGaussianIdentity.as[LabeledPoint])
       }
     
    +
    +  test("glm summary: feature name") {
    +    // dataset1 with no attribute
    +    val dataset1 = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    // dataset2 with attribute
    +    val datasetTmp = Seq(
    +      (2.0, 1.0, 0.0, 5.0),
    +      (8.0, 2.0, 1.0, 7.0),
    +      (3.0, 3.0, 2.0, 11.0),
    +      (9.0, 4.0, 3.0, 13.0),
    +      (2.0, 5.0, 2.0, 3.0)
    +    ).toDF("y", "w", "x1", "x2")
    +    val formula = new RFormula().setFormula("y ~ x1 + x2")
    +    val dataset2 = formula.fit(datasetTmp).transform(datasetTmp)
    +
    +    val expectedFeature = Seq(Array("V1", "V2"), Array("x1", "x2"))
    +
    +    var idx = 0
    +    for (dataset <- Seq(dataset1, dataset2)) {
    +      val model = new GeneralizedLinearRegression().fit(dataset)
    +      model.summary.featureName.zip(expectedFeature(idx))
    +        .foreach{ x => assert(x._1 === x._2) }
    +      idx += 1
    +    }
    +  }
    +
    +  test("glm summary: summaryTable") {
    +    val dataset = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    val expectedFeature = Seq(Array("V1", "V2"),
    +      Array("Intercept", "V1", "V2"))
    +    val expectedEstimate = Seq(Vectors.dense(0.2884, 0.538),
    +      Vectors.dense(0.7903, 0.2258, 0.4677))
    +    val expectedStdError = Seq(Vectors.dense(1.724, 0.3787),
    +      Vectors.dense(4.0129, 2.1153, 0.5815))
    +    val expectedTValue = Seq(Vectors.dense(0.1673, 1.4205),
    +      Vectors.dense(0.1969, 0.1067, 0.8043))
    +    val expectedPValue = Seq(Vectors.dense(0.8778, 0.2506),
    +      Vectors.dense(0.8621, 0.9247, 0.5056))
    +
    +    var idx = 0
    +    for (fitIntercept <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression()
    +        .setFamily("gaussian")
    --- End diff --
    
    Indeed, there is object `Gaussian` and one can use `Gaussian.name` for the string name. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @imatiach-msft I'm not sure the R^2s are used much in the GLM context. The deviance, loglikelihood and AIC/BICs are most often used for ANOVA and model comparison. The GLM [book](https://www.amazon.com/Generalized-Chapman-Monographs-Statistics-Probability/dp/0412317605) is a good reference. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101356475
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1173,33 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNamesLocal = featureNames
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNamesLocal = featureNamesLocal :+ Intercept
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNamesLocal(i), coefficients(i), coefficientStandardErrors(i),
    +        tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    +      import spark.implicits._
    +      result.toDF("Feature", "Coefficient", "StdError", "TValue", "PValue").repartition(1)
    --- End diff --
    
    Sorry, I didn't realize that R uses Estimate instead of coefficient - if you feel strongly about using Estimate here instead you can change this back.  Up to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #73060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73060/testReport)** for PR 16630 at commit [`fd9f1be`](https://github.com/apache/spark/commit/fd9f1becd40fab890f4f7542f66dddc79aecd8a0).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @imatiach-msft @felixcheung 
    I cleaned up the tests as suggested, and also updated the R GLM wrapper to use the result from this PR. Please let me know if there is any other suggestions. Thanks much for the review and comments. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101356243
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
    @@ -1104,6 +1103,83 @@ class GeneralizedLinearRegressionSuite
           .fit(datasetGaussianIdentity.as[LabeledPoint])
       }
     
    +
    +  test("glm summary: feature name") {
    +    // dataset1 with no attribute
    +    val dataset1 = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    // dataset2 with attribute
    +    val datasetTmp = Seq(
    +      (2.0, 1.0, 0.0, 5.0),
    +      (8.0, 2.0, 1.0, 7.0),
    +      (3.0, 3.0, 2.0, 11.0),
    +      (9.0, 4.0, 3.0, 13.0),
    +      (2.0, 5.0, 2.0, 3.0)
    +    ).toDF("y", "w", "x1", "x2")
    +    val formula = new RFormula().setFormula("y ~ x1 + x2")
    +    val dataset2 = formula.fit(datasetTmp).transform(datasetTmp)
    +
    +    val expectedFeature = Seq(Array("V1", "V2"), Array("x1", "x2"))
    +
    +    var idx = 0
    +    for (dataset <- Seq(dataset1, dataset2)) {
    +      val model = new GeneralizedLinearRegression().fit(dataset)
    +      model.summary.featureName.zip(expectedFeature(idx))
    +        .foreach{ x => assert(x._1 === x._2) }
    +      idx += 1
    +    }
    +  }
    +
    +  test("glm summary: summaryTable") {
    +    val dataset = Seq(
    +      Instance(2.0, 1.0, Vectors.dense(0.0, 5.0)),
    +      Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
    +      Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
    +      Instance(9.0, 4.0, Vectors.dense(3.0, 13.0)),
    +      Instance(2.0, 5.0, Vectors.dense(2.0, 3.0))
    +    ).toDF()
    +
    +    val expectedFeature = Seq(Array("V1", "V2"),
    +      Array("Intercept", "V1", "V2"))
    +    val expectedEstimate = Seq(Vectors.dense(0.2884, 0.538),
    +      Vectors.dense(0.7903, 0.2258, 0.4677))
    +    val expectedStdError = Seq(Vectors.dense(1.724, 0.3787),
    +      Vectors.dense(4.0129, 2.1153, 0.5815))
    +    val expectedTValue = Seq(Vectors.dense(0.1673, 1.4205),
    +      Vectors.dense(0.1969, 0.1067, 0.8043))
    +    val expectedPValue = Seq(Vectors.dense(0.8778, 0.2506),
    +      Vectors.dense(0.8621, 0.9247, 0.5056))
    +
    +    var idx = 0
    +    for (fitIntercept <- Seq(false, true)) {
    +      val trainer = new GeneralizedLinearRegression()
    +        .setFamily("gaussian")
    --- End diff --
    
    I would usually prefer to use variables wherever possible as it is much easier to update through various editors and in general it is much easier to catch compile time vs runtime errors.  But it is a minor point, and it looks like this is consistent with most of the spark codebase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #73066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73066/testReport)** for PR 16630 at commit [`4b25146`](https://github.com/apache/spark/commit/4b25146b83e504f0e174c8323e25fbb2acdf0cdd).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #78992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78992/testReport)** for PR 16630 at commit [`ce0851a`](https://github.com/apache/spark/commit/ce0851afc2e24d23bf8f3b8a2fa3cfdd0f99c6f2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Jenkins add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79685/testReport)** for PR 16630 at commit [`a16cbee`](https://github.com/apache/spark/commit/a16cbee4e86cf044a90015bdd6900b9a22116200).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Could somebody help review this PR? I think this will make gathering the estimation results in Scala much easier. This will also be helpful in constructing the tests. For example, the GLM tests with weights can be simplified a lot if we have all results in arrays and SEs etc are aligned with coefficients (current GLM tests with weight force no intercept to avoid this nuisance).
    
    @sethah @imatiach-msft @felixcheung  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78992/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #79686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79686/testReport)** for PR 16630 at commit [`640d564`](https://github.com/apache/spark/commit/640d56442e6f5d1a14b4a0cb895d6da713b003fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79970/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101158362
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -915,6 +917,22 @@ class GeneralizedLinearRegressionSummary private[regression] (
       /** Number of instances in DataFrame predictions. */
       private[regression] lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to "V1", "V2", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureName: Array[String] = {
    +    val featureAttrs = AttributeGroup.fromStructField(
    +      dataset.schema(model.getFeaturesCol)).attributes
    +    if (featureAttrs == None) {
    +      Array.tabulate[String](origModel.numFeatures)((x: Int) => ("V" + (x + 1)))
    --- End diff --
    
    @felixcheung The feature names were not available prior to this PR, right? One other place I see that does similar summary is the `GeneralizedLinearRegressionWrapper` for R. Do you think we should consolidate the two, e.g., update the `Wrapper` to use the summary table directly? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #72901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72901/testReport)** for PR 16630 at commit [`b67d3fd`](https://github.com/apache/spark/commit/b67d3fdc93a0a398dcac0271b501d054082ca793).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r100933747
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -915,6 +917,22 @@ class GeneralizedLinearRegressionSummary private[regression] (
       /** Number of instances in DataFrame predictions. */
       private[regression] lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to "V1", "V2", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureName: Array[String] = {
    --- End diff --
    
    minor comment: it looks like this is an array so should be plural, as in "featureNames" instead of "featureName" without the s


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73439/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @felixcheung @imatiach-msft Thanks much for the review. Made most changes suggested. Please see my inline replies. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @yanboliang Thanks for the suggestions. I have made a new commit that addresses your comments. 
    In the new version, I used an array of tuple to represent the coefficient matrix. I used tuple because I have mixed type of string and double (it's necessary to store the feature names since they also depend on whether there is intercept). I then wrote a `showString` function similar to that in the `DataSet` class that compiles all summary info into a string, and defined show methods to print out the estimated model. The output is very similar to that in R except that I did not show the residuals and significance levels. Please let me know your thoughts on this update. 
    
    Below is an example of the call and the output:
    ```
    model.summary.show()
    +-----------+--------+--------+------+------+
    |    Feature|Estimate|StdError|TValue|PValue|
    +-----------+--------+--------+------+------+
    |(Intercept)|   0.790|   4.013| 0.197| 0.862|
    | features_0|   0.226|   2.115| 0.107| 0.925|
    | features_1|   0.468|   0.582| 0.804| 0.506|
    +-----------+--------+--------+------+------+
    
    (Dispersion parameter for gaussian family taken to be 14.516)
        Null deviance: 46.800 on 2 degrees of freedom
    Residual deviance: 29.032 on 2 degrees of freedom
    AIC: 30.984
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r101158825
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1152,4 +1170,32 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    +    if (isNormalSolver) {
    +      var featureNames = featureName
    +      var coefficients = model.coefficients.toArray
    +      var idx = Array.range(0, coefficients.length)
    +      if (model.getFitIntercept) {
    +        featureNames = featureNames :+ "Intercept"
    +        coefficients = coefficients :+ model.intercept
    +        // Reorder so that intercept comes first
    +        idx = (coefficients.length - 1) +: idx
    +      }
    +      val result = for (i <- idx.toSeq) yield
    +        (featureNames(i), coefficients(i), coefficientStandardErrors(i), tValues(i), pValues(i))
    +
    +      val spark = SparkSession.builder().getOrCreate()
    +      import spark.implicits._
    --- End diff --
    
    I was using the spark session and implicits to be able to use `toDF` to create data frame with names from `Seq`. Could you explain how this `import dataset.sparkSession.implicits._` works? Could not import it in spark shell.
    ```
    <console>:56: error: not found: value dataset
           import dataset.sparkSession.implicits._
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r128202464
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1458,4 +1475,167 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Coefficient matrix with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.3.0")
    +  lazy val coefficientMatrix: Array[(String, Double, Double, Double, Double)] = {
    --- End diff --
    
    This is not a matrix, so it's not appropriate to name it as ```coefficientMatrix```. Since it's only used for generating summary string output, what about keep it private or inline?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r126872238
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1441,4 +1460,33 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    +  lazy val summaryTable: DataFrame = {
    --- End diff --
    
    Could we have a better name? What about output model summary like R? Then we can directly override ```toString``` function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79685/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16630


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r127844472
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1441,4 +1460,33 @@ class GeneralizedLinearRegressionTrainingSummary private[regression] (
             "No p-value available for this GeneralizedLinearRegressionModel")
         }
       }
    +
    +  /**
    +   * Summary table with feature name, coefficient, standard error,
    +   * tValue and pValue.
    +   */
    +  @Since("2.2.0")
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r127844463
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -1187,6 +1189,23 @@ class GeneralizedLinearRegressionSummary private[regression] (
       @Since("2.2.0")
       lazy val numInstances: Long = predictions.count()
     
    +
    +  /**
    +   * Name of features. If the name cannot be retrieved from attributes,
    +   * set default names to feature column name with numbered suffix "_0", "_1", and so on.
    +   */
    +  @Since("2.2.0")
    +  lazy val featureNames: Array[String] = {
    --- End diff --
    
    Made it `private[ml]` since it is used in the R wrapper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #73439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73439/testReport)** for PR 16630 at commit [`8e1c086`](https://github.com/apache/spark/commit/8e1c086cdc953fbe5e0bf6986b1775d07f7cbc6a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #71985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71985/testReport)** for PR 16630 at commit [`6173ba9`](https://github.com/apache/spark/commit/6173ba9834442c53c1740cbd7a47ca6a3312a26a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73066/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    LGTM, merged into master. Thanks for all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #71998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71998/testReport)** for PR 16630 at commit [`78bb77f`](https://github.com/apache/spark/commit/78bb77f6d2616888c07bf625341a7718ff79723c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79766/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    **[Test build #73066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73066/testReport)** for PR 16630 at commit [`4b25146`](https://github.com/apache/spark/commit/4b25146b83e504f0e174c8323e25fbb2acdf0cdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16630#discussion_r103003591
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GeneralizedLinearRegressionWrapper.scala ---
    @@ -99,37 +95,23 @@ private[r] object GeneralizedLinearRegressionWrapper
         val summary = glm.summary
     
         val rFeatures: Array[String] = if (glm.getFitIntercept) {
    -      Array("(Intercept)") ++ features
    +      Array("(Intercept)") ++ summary.featureNames
         } else {
    -      features
    +      summary.featureNames
         }
     
         val rCoefficients: Array[Double] = if (summary.isNormalSolver) {
    -      val rCoefficientStandardErrors = if (glm.getFitIntercept) {
    -        Array(summary.coefficientStandardErrors.last) ++
    -          summary.coefficientStandardErrors.dropRight(1)
    -      } else {
    -        summary.coefficientStandardErrors
    -      }
    +      val rCoefficientStandardErrors =
    +        summary.summaryTable.select("StdError").collect.map(_.getDouble(0))
     
    -      val rTValues = if (glm.getFitIntercept) {
    -        Array(summary.tValues.last) ++ summary.tValues.dropRight(1)
    -      } else {
    -        summary.tValues
    -      }
    +      val rTValues =
    +        summary.summaryTable.select("TValue").collect.map(_.getDouble(0))
     
    -      val rPValues = if (glm.getFitIntercept) {
    -        Array(summary.pValues.last) ++ summary.pValues.dropRight(1)
    -      } else {
    -        summary.pValues
    -      }
    +      val rPValues =
    +        summary.summaryTable.select("PValue").collect.map(_.getDouble(0))
     
    -      if (glm.getFitIntercept) {
    -        Array(glm.intercept) ++ glm.coefficients.toArray ++
    -          rCoefficientStandardErrors ++ rTValues ++ rPValues
    -      } else {
    -        glm.coefficients.toArray ++ rCoefficientStandardErrors ++ rTValues ++ rPValues
    -      }
    +      summary.summaryTable.select("Coefficient").collect.map(_.getDouble(0)) ++
    +        rCoefficientStandardErrors ++ rTValues ++ rPValues
    --- End diff --
    
    Yes, I checked the results, and they are the same. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by imatiach-msft <gi...@git.apache.org>.

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    @actuaryzhang thanks, LGTM!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Jenkins add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16630
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org