You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2015/11/02 15:09:12 UTC

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/9413

    [SPARK-9836] [ML] Provide R-like summary statistics for OLS via normal equation solver

    https://issues.apache.org/jira/browse/SPARK-9836

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-9836

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9413.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9413
    
----
commit 655fb436950e44e1783a2bc3767e40a0295ce83f
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-11-02T14:07:56Z

    Provide R-like summary statistics for OLS via normal equation solver

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43685999
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  lazy val numInstances: Long = predictions.count()
    +
    +  lazy val dfe = if (model.getFitIntercept) {
    +    numInstances - model.weights.size -1
    +  } else {
    +    numInstances - model.weights.size
    +  }
    +
    +  lazy val devianceResiduals: Array[Double] = {
    +    val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else sqrt(col(model.getWeightCol))
    +    val dr = predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
    +      .multiply(weighted).as("weightedResiduals"))
    +      .select(min(col("weightedResiduals")).as("min"), max(col("weightedResiduals")).as("max"))
    +      .take(1)(0)
    +    Array(dr.getDouble(0), dr.getDouble(1))
    +  }
    +
    +  lazy val seCoef: Array[Double] = {
    +    if (diag.length == 1 && diag(0) == 0) {
    +      throw new UnsupportedOperationException(
    +        "No Std. Error coefficients available for this LinearRegressionModel")
    +    } else {
    +      val rss = if (model.getWeightCol.isEmpty) {
    +        meanSquaredError * numInstances
    +      } else {
    +        val t = udf { (pred: Double, label: Double, weight: Double) =>
    +          math.pow(label - pred, 2.0) * weight }
    +        predictions.select(t(col(model.getPredictionCol), col(model.getLabelCol),
    +          col(model.getWeightCol)).as("wse")).agg(sum(col("wse"))).take(1)(0).getDouble(0)
    +      }
    +      val sigma2 = rss / dfe
    +      diag.map(_ * sigma2).map(math.sqrt(_))
    +    }
    +  }
    +
    +  lazy val tVals: Array[Double] = {
    +    if (diag.length == 1 && diag(0) == 0) {
    +      throw new UnsupportedOperationException(
    +        "No t values available for this LinearRegressionModel")
    +    } else {
    +      model.weights.toArray.zip(seCoef).map { x => x._1 / x._2 }
    +    }
    +  }
    +
    +  lazy val pVals: Array[Double] = {
    --- End diff --
    
    `pValues`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153281753
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153263990
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43685991
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  lazy val numInstances: Long = predictions.count()
    +
    +  lazy val dfe = if (model.getFitIntercept) {
    +    numInstances - model.weights.size -1
    +  } else {
    +    numInstances - model.weights.size
    +  }
    +
    +  lazy val devianceResiduals: Array[Double] = {
    +    val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else sqrt(col(model.getWeightCol))
    +    val dr = predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
    +      .multiply(weighted).as("weightedResiduals"))
    +      .select(min(col("weightedResiduals")).as("min"), max(col("weightedResiduals")).as("max"))
    +      .take(1)(0)
    +    Array(dr.getDouble(0), dr.getDouble(1))
    +  }
    +
    +  lazy val seCoef: Array[Double] = {
    +    if (diag.length == 1 && diag(0) == 0) {
    +      throw new UnsupportedOperationException(
    +        "No Std. Error coefficients available for this LinearRegressionModel")
    +    } else {
    +      val rss = if (model.getWeightCol.isEmpty) {
    +        meanSquaredError * numInstances
    +      } else {
    +        val t = udf { (pred: Double, label: Double, weight: Double) =>
    +          math.pow(label - pred, 2.0) * weight }
    +        predictions.select(t(col(model.getPredictionCol), col(model.getLabelCol),
    +          col(model.getWeightCol)).as("wse")).agg(sum(col("wse"))).take(1)(0).getDouble(0)
    +      }
    +      val sigma2 = rss / dfe
    +      diag.map(_ * sigma2).map(math.sqrt(_))
    +    }
    +  }
    +
    +  lazy val tVals: Array[Double] = {
    --- End diff --
    
    `tValues`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43685972
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  lazy val numInstances: Long = predictions.count()
    +
    +  lazy val dfe = if (model.getFitIntercept) {
    --- End diff --
    
    Keep this one private, or a more descriptive name? We need explicit types for public/private members.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r44046156
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  /** Number of instances in DataFrame predictions */
    +  lazy val numInstances: Long = predictions.count()
    +
    +  /** Degrees of freedom */
    +  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
    +    numInstances - model.coefficients.size - 1
    +  } else {
    +    numInstances - model.coefficients.size
    +  }
    +
    +  /**
    +   * The weighted residuals, the usual residuals rescaled by
    +   * the square root of the instance weights.
    +   */
    +  lazy val devianceResiduals: Array[Double] = {
    --- End diff --
    
    I'm late to comment, but am wondering:
    * Why do we not return all deviance residuals as a DataFrame?  If we only return min,max, then that should be documented.  But I'd prefer we return a DataFrame with all deviance residuals.
    * Should we follow R's example and just call this "residuals"?  That will let us add other types of residuals later (specified via an argument, with a default argument of "deviance").


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153027489
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153265915
  
    **[Test build #44891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44891/consoleFull)** for PR 9413 at commit [`42ac991`](https://github.com/apache/spark/commit/42ac991775af48ab80869d0d2d9874cadf665b3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r44308271
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  /** Number of instances in DataFrame predictions */
    +  lazy val numInstances: Long = predictions.count()
    +
    +  /** Degrees of freedom */
    +  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
    +    numInstances - model.coefficients.size - 1
    +  } else {
    +    numInstances - model.coefficients.size
    +  }
    +
    +  /**
    +   * The weighted residuals, the usual residuals rescaled by
    +   * the square root of the instance weights.
    +   */
    +  lazy val devianceResiduals: Array[Double] = {
    --- End diff --
    
    Sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153289839
  
    @mengxr I created [SPARK-11473](https://issues.apache.org/jira/browse/SPARK-11473) to track the issue of supporting summary statistic for intercept. I can work on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r44046167
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/CholeskyDecomposition.scala ---
    @@ -40,4 +40,20 @@ private[spark] object CholeskyDecomposition {
         assert(code == 0, s"lapack.dpotrs returned $code.")
         bx
       }
    +
    +  /**
    +   * Computes the inverse of a real symmetric positive definite matrix A
    +   * using the Cholesky factorization A = U**T*U.
    +   * The input arguments are modified in-place to store the inverse matrix.
    +   * @param UAi the upper triangular factor U from the Cholesky factorization A = U**T*U
    +   * @param k the dimension of A
    +   * @return the upper triangle of the (symmetric) inverse of A
    +   */
    +  def inverse(UAi: Array[Double], k: Int): Array[Double] = {
    +    val info = new intW(0)
    +    lapack.dpptri("U", k, UAi, info)
    +    val code = info.`val`
    +    assert(code == 0, s"lapack.dpptri returned $code.")
    --- End diff --
    
    This throws an AssertionError on failure.  It'd be better to throw a RuntimeError (or one based on the return code, though that may be too much trouble).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153031391
  
    In the current implementation we provide ```Std. Error``` for ```coefficients``` excepts ```intercept```, because that we use optimized method to calculate ```intercept```. If we want to calculate ```Std. Error``` for ```intercept```, we need to concat ```aBar``` array with ```aaBar.values``` like
    ```scala
    val newAtA = Array.concat(aaBar.values, summary.aBar.toArray, Array(1.0))
    val newAtB = Array.concat(abBar.values, Array(bBar))
    
    val xWithIntercept = CholeskyDecomposition.solve(newAtA, newAtB)
    val newAtAi = CholeskyDecomposition.inverse(newAtA, summary.k)
    ```
    I'm afraid that it will cause performance degradation, so I propose output ```Std. Error``` only for ```coefficients``. May be here we should discuss, or figure out better way to output ```Std. Error``` for ```intercept``. @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43685888
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
    @@ -26,10 +26,12 @@ import org.apache.spark.rdd.RDD
      * Model fitted by [[WeightedLeastSquares]].
      * @param coefficients model coefficients
      * @param intercept model intercept
    + * @param diag diagonal of matrix (A^T * W * A)^-1
    --- End diff --
    
    `diag` is not a descriptive name, `diagInvAtWA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r44046160
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  /** Number of instances in DataFrame predictions */
    +  lazy val numInstances: Long = predictions.count()
    +
    +  /** Degrees of freedom */
    +  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
    +    numInstances - model.coefficients.size - 1
    +  } else {
    +    numInstances - model.coefficients.size
    +  }
    +
    +  /**
    +   * The weighted residuals, the usual residuals rescaled by
    +   * the square root of the instance weights.
    +   */
    +  lazy val devianceResiduals: Array[Double] = {
    +    val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else sqrt(col(model.getWeightCol))
    +    val dr = predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
    +      .multiply(weighted).as("weightedResiduals"))
    +      .select(min(col("weightedResiduals")).as("min"), max(col("weightedResiduals")).as("max"))
    +      .first()
    +    Array(dr.getDouble(0), dr.getDouble(1))
    +  }
    +
    +  /**
    +   * Standard error of estimated coefficients.
    +   * Note that standard error of estimated intercept is not supported currently.
    +   */
    +  lazy val coefficientStandardErrors: Array[Double] = {
    --- End diff --
    
    Should we return a Vector (to match the type of coefficients)?  Same for tValues and pValues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r44254161
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  /** Number of instances in DataFrame predictions */
    +  lazy val numInstances: Long = predictions.count()
    +
    +  /** Degrees of freedom */
    +  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
    +    numInstances - model.coefficients.size - 1
    +  } else {
    +    numInstances - model.coefficients.size
    +  }
    +
    +  /**
    +   * The weighted residuals, the usual residuals rescaled by
    +   * the square root of the instance weights.
    +   */
    +  lazy val devianceResiduals: Array[Double] = {
    --- End diff --
    
    @jkbradley There is "residuals" already exist, so I call it ```devianceResiduals```. I agree your opinion about adding other types of residuals later, so I think we can try to combine the two functions into one with different arguments. We also need do some code clean up for ```LinearRegressionSummary``` due to redundant arguments, I can finish it in a follow up PR. @mengxr  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153043633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44812/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43685970
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  lazy val numInstances: Long = predictions.count()
    --- End diff --
    
    missing doc (please also update other public/private methods)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153264106
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43685986
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  lazy val numInstances: Long = predictions.count()
    +
    +  lazy val dfe = if (model.getFitIntercept) {
    +    numInstances - model.weights.size -1
    +  } else {
    +    numInstances - model.weights.size
    +  }
    +
    +  lazy val devianceResiduals: Array[Double] = {
    +    val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else sqrt(col(model.getWeightCol))
    +    val dr = predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
    +      .multiply(weighted).as("weightedResiduals"))
    +      .select(min(col("weightedResiduals")).as("min"), max(col("weightedResiduals")).as("max"))
    +      .take(1)(0)
    +    Array(dr.getDouble(0), dr.getDouble(1))
    +  }
    +
    +  lazy val seCoef: Array[Double] = {
    --- End diff --
    
    `coefficientStandardErrors`? It is hard to guess what `seCoef` means. In the doc, we should say "intercept" is not supported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43634982
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  lazy val numInstances: Long = predictions.count()
    +
    +  lazy val dfe = if (model.getFitIntercept) {
    +    numInstances - model.weights.size -1
    +  } else {
    +    numInstances - model.weights.size
    +  }
    +
    +  lazy val devianceResiduals: Array[Double] = {
    +    val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else sqrt(col(model.getWeightCol))
    +    val dr = predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
    +      .multiply(weighted).as("weightedResiduals"))
    +      .select(min(col("weightedResiduals")).as("min"), max(col("weightedResiduals")).as("max"))
    +      .take(1)(0)
    +    Array(dr.getDouble(0), dr.getDouble(1))
    --- End diff --
    
    DataFrame currently does not provide interface to calculate percentile (only Hive UDAF), so here we only provide max and min value of deviance residuals. [SPARK-9299](https://issues.apache.org/jira/browse/SPARK-9299) works on providing ```percentile``` and ```percentile_approx``` aggregate functions, after it was resolved we can provide deviance residuals of quantile (0.25, 0.5, 0.75).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153027517
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43702995
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala ---
    @@ -715,4 +724,63 @@ class LinearRegressionSuite extends SparkFunSuite with MLlibTestSparkContext {
             .sliding(2)
             .forall(x => x(0) >= x(1)))
       }
    +
    +  test("linear regression training summary with weighted samples by normal solver") {
    --- End diff --
    
    Could you also add a test without intercept?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153281755
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44891/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153043631
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43685982
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  lazy val numInstances: Long = predictions.count()
    +
    +  lazy val dfe = if (model.getFitIntercept) {
    +    numInstances - model.weights.size -1
    +  } else {
    +    numInstances - model.weights.size
    +  }
    +
    +  lazy val devianceResiduals: Array[Double] = {
    +    val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else sqrt(col(model.getWeightCol))
    +    val dr = predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
    +      .multiply(weighted).as("weightedResiduals"))
    +      .select(min(col("weightedResiduals")).as("min"), max(col("weightedResiduals")).as("max"))
    +      .take(1)(0)
    --- End diff --
    
    `.take(1)(0`) -> `.first()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153281693
  
    **[Test build #44891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44891/consoleFull)** for PR 9413 at commit [`42ac991`](https://github.com/apache/spark/commit/42ac991775af48ab80869d0d2d9874cadf665b3e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153406552
  
    LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43685896
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
    @@ -109,6 +113,9 @@ private[ml] class WeightedLeastSquares(
     
         val x = new DenseVector(CholeskyDecomposition.solve(aaBar.values, abBar.values))
     
    +    val aaInv = CholeskyDecomposition.inverse(aaBar.values, k)
    +    val diag = new DenseVector((1 to k).map{ i => aaInv(i + (i - 1) * i / 2 - 1) / wSum }.toArray)
    --- End diff --
    
    Need an inline comment to explain the index mapping. It is sufficient to just mention that `aaInv` is a packed upper triangular matrix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43770122
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  /** Number of instances in DataFrame predictions */
    +  lazy val numInstances: Long = predictions.count()
    +
    +  /** Degrees of freedom */
    +  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
    +    numInstances - model.coefficients.size - 1
    +  } else {
    +    numInstances - model.coefficients.size
    +  }
    +
    +  /**
    +   * The weighted residuals, the usual residuals rescaled by
    +   * the square root of the instance weights.
    +   */
    +  lazy val devianceResiduals: Array[Double] = {
    +    val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else sqrt(col(model.getWeightCol))
    +    val dr = predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
    +      .multiply(weighted).as("weightedResiduals"))
    +      .select(min(col("weightedResiduals")).as("min"), max(col("weightedResiduals")).as("max"))
    +      .first()
    +    Array(dr.getDouble(0), dr.getDouble(1))
    +  }
    +
    +  /**
    +   * Standard error of estimated coefficients.
    +   * Note that standard error of estimated intercept is not supported currently.
    +   */
    +  lazy val coefficientStandardErrors: Array[Double] = {
    +    if (diagInvAtWA.length == 1 && diagInvAtWA(0) == 0) {
    +      throw new UnsupportedOperationException(
    +        "No Std. Error of coefficients available for this LinearRegressionModel")
    +    } else {
    +      val rss = if (model.getWeightCol.isEmpty) {
    +        meanSquaredError * numInstances
    +      } else {
    +        val t = udf { (pred: Double, label: Double, weight: Double) =>
    +          math.pow(label - pred, 2.0) * weight }
    +        predictions.select(t(col(model.getPredictionCol), col(model.getLabelCol),
    +          col(model.getWeightCol)).as("wse")).agg(sum(col("wse"))).first().getDouble(0)
    +      }
    +      val sigma2 = rss / degreesOfFreedom
    +      diagInvAtWA.map(_ * sigma2).map(math.sqrt(_))
    +    }
    +  }
    +
    +  /** T-statistic of estimated coefficients.
    --- End diff --
    
    minor: This is ScalaDoc style. We can fix it in the next update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153029589
  
    **[Test build #44812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44812/consoleFull)** for PR 9413 at commit [`655fb43`](https://github.com/apache/spark/commit/655fb436950e44e1783a2bc3767e40a0295ce83f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153031528
  
    In the current implementation we provide ```Std. Error``` for ```coefficients``` excepts ```intercept```, because that we use optimized method to calculate ```intercept```. If we want to calculate ```Std. Error``` for ```intercept```, we need to concat ```aBar``` array with ```aaBar.values``` like
    ```scala
    val newAtA = Array.concat(aaBar.values, summary.aBar.toArray, Array(1.0))
    val newAtB = Array.concat(abBar.values, Array(bBar))
    
    val xWithIntercept = CholeskyDecomposition.solve(newAtA, newAtB)
    val newAtAi = CholeskyDecomposition.inverse(newAtA, summary.k)
    ```
    I'm afraid that it will cause performance degradation, so I propose output ```Std. Error``` only for ```coefficients``. May be here we should discuss, or figure out better way to output ```Std. Error``` for ```intercept``. @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9413


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153031428
  
    In the current implementation we provide ```Std. Error``` for ```coefficients``` excepts ```intercept```, because that we use optimized method to calculate ```intercept```. If we want to calculate ```Std. Error``` for ```intercept```, we need to concat ```aBar``` array with ```aaBar.values``` like
    ```scala
    val newAtA = Array.concat(aaBar.values, summary.aBar.toArray, Array(1.0))
    val newAtB = Array.concat(abBar.values, Array(bBar))
    
    val xWithIntercept = CholeskyDecomposition.solve(newAtA, newAtB)
    val newAtAi = CholeskyDecomposition.inverse(newAtA, summary.k)
    ```
    I'm afraid that it will cause performance degradation, so I propose output ```Std. Error``` only for ```coefficients``. May be here we should discuss, or figure out better way to output ```Std. Error``` for ```intercept``. @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9413#discussion_r43686227
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
         predictions.select(t(col(predictionCol), col(labelCol)).as("residuals"))
       }
     
    +  lazy val numInstances: Long = predictions.count()
    +
    +  lazy val dfe = if (model.getFitIntercept) {
    +    numInstances - model.weights.size -1
    +  } else {
    +    numInstances - model.weights.size
    +  }
    +
    +  lazy val devianceResiduals: Array[Double] = {
    --- End diff --
    
    It is useful to document that this is weighted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153031344
  
    In the current implementation we provide ```Std. Error``` for ```coefficients``` excepts ```intercept```, because that we use optimized method to calculate ```intercept```. If we want to calculate ```Std. Error``` for ```intercept```, we need to concat ```aBar``` array with ```aaBar.values``` like
    ```scala
    val newAtA = Array.concat(aaBar.values, summary.aBar.toArray, Array(1.0))
    val newAtB = Array.concat(abBar.values, Array(bBar))
    
    val xWithIntercept = CholeskyDecomposition.solve(newAtA, newAtB)
    val newAtAi = CholeskyDecomposition.inverse(newAtA, summary.k)
    ```
    I'm afraid that it will cause performance degradation, so I propose output ```Std. Error``` only for ```coefficients``. May be here we should discuss, or figure out better way to output ```Std. Error``` for ```intercept``. @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153165800
  
    @yanboliang The implementation looks good to me. I left some comments about comment/documentation. Could you address them today to catch 1.6? It is okay to address the issue with intercept in a follow-up PR. You can create a JIRA for it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9413#issuecomment-153043459
  
    **[Test build #44812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44812/consoleFull)** for PR 9413 at commit [`655fb43`](https://github.com/apache/spark/commit/655fb436950e44e1783a2bc3767e40a0295ce83f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org