You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jkbradley <gi...@git.apache.org> on 2016/04/23 00:29:23 UTC

[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/12624

    [SPARK-14852][ML] refactored GLM summary into training, non-training summaries

    ## What changes were proposed in this pull request?
    
    This splits GeneralizedLinearRegressionSummary into 2 summary types:
    * GeneralizedLinearRegressionSummary, which does not store info from fitting (diagInvAtWA)
    * GeneralizedLinearRegressionTrainingSummary, which is a subclass of GeneralizedLinearRegressionSummary and stores info from fitting
    
    This also add a method evaluate() which can produce a GeneralizedLinearRegressionSummary on a new dataset.
    
    The summary no longer provides the model itself as a public val.
    
    Also:
    * Fixes bug where GeneralizedLinearRegressionTrainingSummary was created with model, not summaryModel.
    * Adds hasSummary method.
    * Renames findSummaryModelAndPredictionCol -> getSummaryModel and simplifies that method.
    * In summary, extract values from model immediately in case user later changes those (e.g., predictionCol).
    * Pardon the style fixes; that is IntelliJ being obnoxious.
    
    ## How was this patch tested?
    
    Existing unit tests + updated test for evaluate and hasSummary

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark model-summary-api

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12624
    
----
commit b8e20d73971b6547f87757dba17c4b4190f86db1
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2016-04-22T20:19:07Z

    refactored GLM summary into training, non-training summaries

commit 98fa4ea0e5abcc230d57db51eac56ad21aeee1e8
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2016-04-22T22:24:38Z

    fixed IntelliJ style mistakes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-213619790
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56745/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12624#discussion_r61251215
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -761,16 +783,24 @@ class GeneralizedLinearRegressionModel private[ml] (
        * otherwise generates a new column and sets it as the prediction column on a new copy
        * of the current model.
        */
    -  private[regression] def findSummaryModelAndPredictionCol()
    -    : (GeneralizedLinearRegressionModel, String) = {
    +  private[regression] def getSummaryModel: GeneralizedLinearRegressionModel = {
         $(predictionCol) match {
           case "" =>
             val predictionColName = "prediction_" + java.util.UUID.randomUUID.toString
    -        (copy(ParamMap.empty).setPredictionCol(predictionColName), predictionColName)
    -      case p => (this, p)
    +        copy(ParamMap.empty).setPredictionCol(predictionColName)
    +      case p => this
    --- End diff --
    
    I'd like to change it to ```case p => copy(ParamMap.empty)```, and this will make ```GeneralizedLinearRegressionTrainingSummary``` always along with an immutable model.
    Users want to learn about the training summary is a specific model rather than an updated model. If model was updated later not by retrain, we should not provide updated training summary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215229908
  
    **[Test build #57167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57167/consoleFull)** for PR 12624 at commit [`37e31db`](https://github.com/apache/spark/commit/37e31db8f05a0f3d5721763229f3ba38b50db880).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215436444
  
    LGTM except the last minor issue. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-213612710
  
    **[Test build #56745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56745/consoleFull)** for PR 12624 at commit [`dfd2de5`](https://github.com/apache/spark/commit/dfd2de5a9b1920301d2005a05c051c1e43d4b597).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12624#discussion_r61321012
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -761,16 +783,24 @@ class GeneralizedLinearRegressionModel private[ml] (
        * otherwise generates a new column and sets it as the prediction column on a new copy
        * of the current model.
        */
    -  private[regression] def findSummaryModelAndPredictionCol()
    -    : (GeneralizedLinearRegressionModel, String) = {
    +  private[regression] def getSummaryModel: GeneralizedLinearRegressionModel = {
         $(predictionCol) match {
           case "" =>
             val predictionColName = "prediction_" + java.util.UUID.randomUUID.toString
    -        (copy(ParamMap.empty).setPredictionCol(predictionColName), predictionColName)
    -      case p => (this, p)
    +        copy(ParamMap.empty).setPredictionCol(predictionColName)
    +      case p => this
    --- End diff --
    
    That sounds good.  That way, we can just use the model internally in the summary and will not need to copy the model parameters below.  I'll do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215074686
  
    @jkbradley This looks good overall, only one concern at inline comment. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215725997
  
    Oh, sorry for misunderstand. You are right, thanks for remind.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215230031
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57167/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-213619788
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215217742
  
    Thanks for the suggestion; this simplifies things.  I just pushed an update, but please hold off on review while I make a pass over the diff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12624#discussion_r61252164
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -834,43 +864,51 @@ object GeneralizedLinearRegressionModel extends MLReadable[GeneralizedLinearRegr
     
     /**
      * :: Experimental ::
    - * Summarizing Generalized Linear regression Fits.
    + * Summary of [[GeneralizedLinearRegression]] model and predictions.
      *
      * @param predictions predictions output by the model's `transform` method
    - * @param predictionCol field in "predictions" which gives the prediction value of each instance
      * @param model the model that should be summarized
    - * @param diagInvAtWA diagonal of matrix (A^T * W * A)^-1 in the last iteration
    - * @param numIterations number of iterations
    - * @param solver the solver algorithm used for model training
      */
     @Since("2.0.0")
     @Experimental
     class GeneralizedLinearRegressionSummary private[regression] (
         @Since("2.0.0") @transient val predictions: DataFrame,
    -    @Since("2.0.0") val predictionCol: String,
    -    @Since("2.0.0") val model: GeneralizedLinearRegressionModel,
    -    private val diagInvAtWA: Array[Double],
    -    @Since("2.0.0") val numIterations: Int,
    -    @Since("2.0.0") val solver: String) extends Serializable {
    +    model: GeneralizedLinearRegressionModel) extends Serializable {
     
       import GeneralizedLinearRegression._
     
    -  private lazy val family = Family.fromName(model.getFamily)
    -  private lazy val link = if (model.isDefined(model.getParam("link"))) {
    +  // Extract values from model immediately to avoid problems with user reseting model Params.
    +  // Coefficients is not a deep copy, but that is acceptable.
    +
    +  /**
    +   * Field in "predictions" which gives the prediction value of each instance
    +   */
    +  @Since("2.0.0")
    +  val predictionCol: String = model.getPredictionCol
    +
    +  private val weightCol: String = model.getWeightCol
    +  private val labelCol: String = model.getLabelCol
    +
    +  private[regression] val familyName: String = model.getFamily
    +  private[regression] val family: Family = Family.fromName(familyName)
    +  private[regression] val link: Link = if (model.isDefined(model.link)) {
         Link.fromName(model.getLink)
       } else {
         family.defaultLink
       }
    +  private[regression] val fitIntercept: Boolean = model.getFitIntercept
    +  private[regression] val intercept: Double = model.intercept
    +  private[regression] val coefficients: Vector = model.coefficients
    --- End diff --
    
    If the model of ```GeneralizedLinearRegressionSummary``` is immutable, it's not necessary to extract these values in advance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215517803
  
    Thanks for reviewing!  I'll merge this with master
    (But if I'm wrong about that last comment, I can do a follow-up fix.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12624


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12624#discussion_r61433313
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -834,36 +836,55 @@ object GeneralizedLinearRegressionModel extends MLReadable[GeneralizedLinearRegr
     
     /**
      * :: Experimental ::
    - * Summarizing Generalized Linear regression Fits.
    + * Summary of [[GeneralizedLinearRegression]] model and predictions.
      *
    - * @param predictions predictions output by the model's `transform` method
    - * @param predictionCol field in "predictions" which gives the prediction value of each instance
    - * @param model the model that should be summarized
    - * @param diagInvAtWA diagonal of matrix (A^T * W * A)^-1 in the last iteration
    - * @param numIterations number of iterations
    - * @param solver the solver algorithm used for model training
    + * @param dataset Dataset to be summarized.
    + * @param origModel Model to be summarized.  This is copied to create an internal
    + *                  model which cannot be modified from outside.
      */
     @Since("2.0.0")
     @Experimental
     class GeneralizedLinearRegressionSummary private[regression] (
    -    @Since("2.0.0") @transient val predictions: DataFrame,
    -    @Since("2.0.0") val predictionCol: String,
    -    @Since("2.0.0") val model: GeneralizedLinearRegressionModel,
    -    private val diagInvAtWA: Array[Double],
    -    @Since("2.0.0") val numIterations: Int,
    -    @Since("2.0.0") val solver: String) extends Serializable {
    +    dataset: Dataset[_],
    +    origModel: GeneralizedLinearRegressionModel) extends Serializable {
     
       import GeneralizedLinearRegression._
     
    -  private lazy val family = Family.fromName(model.getFamily)
    -  private lazy val link = if (model.isDefined(model.getParam("link"))) {
    +  /**
    +   * Field in "predictions" which gives the prediction value of each instance.
    +   * This is set to a new column name if the original model's `predictionCol` is not set.
    +   */
    +  @Since("2.0.0")
    +  val predictionCol: String = {
    +    if (origModel.isDefined(origModel.predictionCol) && origModel.getPredictionCol != "") {
    +      origModel.getPredictionCol
    +    } else {
    +      "prediction_" + java.util.UUID.randomUUID.toString
    +    }
    +  }
    +
    +  /**
    +   * Private copy of model to ensure Params are not modified outside this class.
    +   * Coefficients is not a deep copy, but that is acceptable.
    --- End diff --
    
    L874 will deep copy the model including coefficients and intercept, so I think the document is out of date.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-213612149
  
    CC: @yanboliang Would you have time to take a look at this?  I'd like to get this into 2.0 (before this becomes a public API).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-213619630
  
    **[Test build #56745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56745/consoleFull)** for PR 12624 at commit [`dfd2de5`](https://github.com/apache/spark/commit/dfd2de5a9b1920301d2005a05c051c1e43d4b597).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215218996
  
    **[Test build #57167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57167/consoleFull)** for PR 12624 at commit [`37e31db`](https://github.com/apache/spark/commit/37e31db8f05a0f3d5721763229f3ba38b50db880).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12624#discussion_r61477060
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -834,36 +836,55 @@ object GeneralizedLinearRegressionModel extends MLReadable[GeneralizedLinearRegr
     
     /**
      * :: Experimental ::
    - * Summarizing Generalized Linear regression Fits.
    + * Summary of [[GeneralizedLinearRegression]] model and predictions.
      *
    - * @param predictions predictions output by the model's `transform` method
    - * @param predictionCol field in "predictions" which gives the prediction value of each instance
    - * @param model the model that should be summarized
    - * @param diagInvAtWA diagonal of matrix (A^T * W * A)^-1 in the last iteration
    - * @param numIterations number of iterations
    - * @param solver the solver algorithm used for model training
    + * @param dataset Dataset to be summarized.
    + * @param origModel Model to be summarized.  This is copied to create an internal
    + *                  model which cannot be modified from outside.
      */
     @Since("2.0.0")
     @Experimental
     class GeneralizedLinearRegressionSummary private[regression] (
    -    @Since("2.0.0") @transient val predictions: DataFrame,
    -    @Since("2.0.0") val predictionCol: String,
    -    @Since("2.0.0") val model: GeneralizedLinearRegressionModel,
    -    private val diagInvAtWA: Array[Double],
    -    @Since("2.0.0") val numIterations: Int,
    -    @Since("2.0.0") val solver: String) extends Serializable {
    +    dataset: Dataset[_],
    +    origModel: GeneralizedLinearRegressionModel) extends Serializable {
     
       import GeneralizedLinearRegression._
     
    -  private lazy val family = Family.fromName(model.getFamily)
    -  private lazy val link = if (model.isDefined(model.getParam("link"))) {
    +  /**
    +   * Field in "predictions" which gives the prediction value of each instance.
    +   * This is set to a new column name if the original model's `predictionCol` is not set.
    +   */
    +  @Since("2.0.0")
    +  val predictionCol: String = {
    +    if (origModel.isDefined(origModel.predictionCol) && origModel.getPredictionCol != "") {
    +      origModel.getPredictionCol
    +    } else {
    +      "prediction_" + java.util.UUID.randomUUID.toString
    +    }
    +  }
    +
    +  /**
    +   * Private copy of model to ensure Params are not modified outside this class.
    +   * Coefficients is not a deep copy, but that is acceptable.
    --- End diff --
    
    I'm pretty sure it's a shallow copy of the coefficients: [https://github.com/apache/spark/blob/89addd40abdacd65cc03ac8aa5f9cf3dd4a4c19b/mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala#L776]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215230026
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14852][ML] refactored GLM summary into ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/12624#issuecomment-215218766
  
    OK, now ready for another review pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org