You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2016/05/16 08:17:02 UTC

[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/13129

    [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and code audit for regression

    ## What changes were proposed in this pull request?
    * GeneralizedLinearRegression API docs enhancement.
    * The default value of GeneralizedLinearRegression ```linkPredictionCol``` is not set rather than empty. 
    * This will consistent with other similar params such as ```weightCol``` 
    * Make some methods more private.
    * Fix a minor bug of LinearRegression.
    * Fix some other issues.
    
    ## How was this patch tested?
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-15339

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13129
    
----
commit 254313c55a0a4228a7bbeecec6faf13b9bafed6b
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-05-16T08:11:43Z

    ML 2.0 QA: Scala APIs and code audit for regression

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219379955
  
    **[Test build #58629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58629/consoleFull)** for PR 13129 at commit [`254313c`](https://github.com/apache/spark/commit/254313c55a0a4228a7bbeecec6faf13b9bafed6b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63692508
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -107,7 +107,7 @@ private[regression] trait GeneralizedLinearRegressionBase extends PredictorParam
             s"with ${$(family)} family does not support ${$(link)} link function.")
         }
         val newSchema = super.validateAndTransformSchema(schema, fitting, featuresDataType)
    -    if ($(linkPredictionCol).nonEmpty) {
    +    if (isDefined(linkPredictionCol) && $(linkPredictionCol).nonEmpty) {
    --- End diff --
    
    Good point, updated. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63332287
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -239,10 +239,7 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
         }
         val familyAndLink = new FamilyAndLink(familyObj, linkObj)
     
    -    val numFeatures = dataset.select(col($(featuresCol))).limit(1).rdd
    -      .map { case Row(features: Vector) =>
    -        features.size
    -      }.first()
    +    val numFeatures = dataset.select(col($(featuresCol))).first().getAs[Vector](0).size
    --- End diff --
    
    can we not do `dataset.select(col($(featuresCol))).as[Vector].first().size`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219928497
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219646339
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-220018777
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58777/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63330793
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -252,7 +250,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
               model,
               Array(0D),
               Array(0D))
    -        return copyValues(model.setSummary(trainingSummary))
    +        return model.setSummary(trainingSummary)
    --- End diff --
    
    This is a minor bug of ```LinearRegression```, we should first copy values from parent estimator and then call ```findSummaryModelAndPredictionCol```, otherwise we will always get empty ```predictionCol```(and other params) for the ```LinearRegressionModel```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219928499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58744/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219924209
  
    **[Test build #58744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58744/consoleFull)** for PR 13129 at commit [`d38b1eb`](https://github.com/apache/spark/commit/d38b1eb137c13e51fec6f898ed4918e2b4d5a264).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13129


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63474760
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -239,10 +239,7 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
         }
         val familyAndLink = new FamilyAndLink(familyObj, linkObj)
     
    -    val numFeatures = dataset.select(col($(featuresCol))).limit(1).rdd
    -      .map { case Row(features: Vector) =>
    -        features.size
    -      }.first()
    +    val numFeatures = dataset.select(col($(featuresCol))).first().getAs[Vector](0).size
    --- End diff --
    
    It looks like Spark does not provide encoder for Vector. If I change to use ```as[Vector]```, the compiler will complain:
    ```
    Error:(244, 61) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
        val numFeatures = dataset.select(col($(featuresCol))).as[Vector].first().size
                                                                ^
    ``` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63333283
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -252,7 +250,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
               model,
               Array(0D),
               Array(0D))
    -        return copyValues(model.setSummary(trainingSummary))
    +        return model.setSummary(trainingSummary)
    --- End diff --
    
    Yes, this is due to we don't have excellent test coverage ...
    I will add test case after collect comments. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63332565
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -252,7 +250,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
               model,
               Array(0D),
               Array(0D))
    -        return copyValues(model.setSummary(trainingSummary))
    +        return model.setSummary(trainingSummary)
    --- End diff --
    
    So test cases didn't pick this up? We should look into why and amend the tests accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63332619
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -159,9 +159,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
     
       override protected def train(dataset: Dataset[_]): LinearRegressionModel = {
         // Extract the number of features before deciding optimization solver.
    -    val numFeatures = dataset.select(col($(featuresCol))).limit(1).rdd.map {
    -      case Row(features: Vector) => features.size
    -    }.first()
    +    val numFeatures = dataset.select(col($(featuresCol))).first().getAs[Vector](0).size
    --- End diff --
    
    same here, can we do `as[Vector]` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63588807
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/IsotonicRegression.scala ---
    @@ -69,8 +69,8 @@ private[regression] trait IsotonicRegressionBase extends Params with HasFeatures
       setDefault(isotonic -> true, featureIndex -> 0)
     
       /** Checks whether the input has weight column. */
    -  protected[ml] def hasWeightCol: Boolean = {
    -    isDefined(weightCol) && $(weightCol) != ""
    +  protected def hasWeightCol: Boolean = {
    --- End diff --
    
    ```private[ml]```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63643069
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -252,7 +250,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
               model,
               Array(0D),
               Array(0D))
    -        return copyValues(model.setSummary(trainingSummary))
    +        return model.setSummary(trainingSummary)
    --- End diff --
    
    @jkbradley It does not necessary to ```setParent``` at here, because we have done it at ```Predictor.fit```
    ```
    override def fit(dataset: Dataset[_]): M = {
        // This handles a few items such as schema validation.
        // Developers only need to implement train().
        transformSchema(dataset.schema, logging = true)
        copyValues(train(dataset).setParent(this))
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219926099
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58742/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-220018623
  
    **[Test build #58777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58777/consoleFull)** for PR 13129 at commit [`1fbd1dc`](https://github.com/apache/spark/commit/1fbd1dc6e652f2a22a59477fa9f76878ef71ac13).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-220009142
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219926031
  
    **[Test build #58742 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58742/consoleFull)** for PR 13129 at commit [`374e610`](https://github.com/apache/spark/commit/374e6107a799ac2dbe1cc297a4dc92c4ae853507).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-220018776
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-220009820
  
    **[Test build #58777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58777/consoleFull)** for PR 13129 at commit [`1fbd1dc`](https://github.com/apache/spark/commit/1fbd1dc6e652f2a22a59477fa9f76878ef71ac13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219380059
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219380061
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58629/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219922177
  
    **[Test build #58742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58742/consoleFull)** for PR 13129 at commit [`374e610`](https://github.com/apache/spark/commit/374e6107a799ac2dbe1cc297a4dc92c4ae853507).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63588786
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala ---
    @@ -88,8 +88,8 @@ private[regression] trait AFTSurvivalRegressionParams extends Params
       def getQuantilesCol: String = $(quantilesCol)
     
       /** Checks whether the input has quantiles column name. */
    -  protected[regression] def hasQuantilesCol: Boolean = {
    -    isDefined(quantilesCol) && $(quantilesCol) != ""
    +  protected def hasQuantilesCol: Boolean = {
    --- End diff --
    
    This is probably meant to be ```private[regression]```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-220528642
  
    Merged into master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219372977
  
    **[Test build #58629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58629/consoleFull)** for PR 13129 at commit [`254313c`](https://github.com/apache/spark/commit/254313c55a0a4228a7bbeecec6faf13b9bafed6b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219638216
  
    **[Test build #58673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58673/consoleFull)** for PR 13129 at commit [`645f6c4`](https://github.com/apache/spark/commit/645f6c45344b1365fdbcc37131b683be94a20e59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63333621
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -239,10 +239,7 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
         }
         val familyAndLink = new FamilyAndLink(familyObj, linkObj)
     
    -    val numFeatures = dataset.select(col($(featuresCol))).limit(1).rdd
    -      .map { case Row(features: Vector) =>
    -        features.size
    -      }.first()
    +    val numFeatures = dataset.select(col($(featuresCol))).first().getAs[Vector](0).size
    --- End diff --
    
    You means ```dataset.select(col($(featuresCol))).as[Vector].first().size```? I think it's OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63475090
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -252,7 +250,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
               model,
               Array(0D),
               Array(0D))
    -        return copyValues(model.setSummary(trainingSummary))
    +        return model.setSummary(trainingSummary)
    --- End diff --
    
    @MLnick I added test case for this scenario and updated other test cases to ensure coping prediction column(and other params) correct in all situations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63687411
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -107,7 +107,7 @@ private[regression] trait GeneralizedLinearRegressionBase extends PredictorParam
             s"with ${$(family)} family does not support ${$(link)} link function.")
         }
         val newSchema = super.validateAndTransformSchema(schema, fitting, featuresDataType)
    -    if ($(linkPredictionCol).nonEmpty) {
    +    if (isDefined(linkPredictionCol) && $(linkPredictionCol).nonEmpty) {
    --- End diff --
    
    This is used twice now, perhaps makes sense to make a `def hasLinkPredictionCol` as for e.g. `hasQuantilesCol`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219646342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58673/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219928428
  
    **[Test build #58744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58744/consoleFull)** for PR 13129 at commit [`d38b1eb`](https://github.com/apache/spark/commit/d38b1eb137c13e51fec6f898ed4918e2b4d5a264).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63677021
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala ---
    @@ -239,10 +239,7 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
         }
         val familyAndLink = new FamilyAndLink(familyObj, linkObj)
     
    -    val numFeatures = dataset.select(col($(featuresCol))).limit(1).rdd
    -      .map { case Row(features: Vector) =>
    -        features.size
    -      }.first()
    +    val numFeatures = dataset.select(col($(featuresCol))).first().getAs[Vector](0).size
    --- End diff --
    
    Ah right, we would need to add an implicit encoder `implicit def encoder: Encoder[Vector] = ExpressionEncoder()`, e.g. see here https://github.com/apache/spark/pull/12718#issuecomment-215332276.
    
    However, let's leave that change for #12718


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13129#discussion_r63588818
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -252,7 +250,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
               model,
               Array(0D),
               Array(0D))
    -        return copyValues(model.setSummary(trainingSummary))
    +        return model.setSummary(trainingSummary)
    --- End diff --
    
    We also need to setParent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219926097
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13129#issuecomment-219646199
  
    **[Test build #58673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58673/consoleFull)** for PR 13129 at commit [`645f6c4`](https://github.com/apache/spark/commit/645f6c45344b1365fdbcc37131b683be94a20e59).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org