You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/11/08 15:29:59 UTC

[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/22986

    [SPARK-25959][ML] GBTClassifier picks wrong impurity stats on loading

    ## What changes were proposed in this pull request?
    
    Our `GBTClassifier` supports only `variance` impurity. But unfortunately, its `impurity` param by default contains the value `gini`: it is not even modifiable by the user and it differs from the actual impurity used, which is `variance`. This issue does not limit to a wrong value returned for it if the user queries by `getImpurity`, but it also affect the load of a saved model, as its `impurityStats` are created as `gini` (since this is the value stored for the model impurity) which leads to wrong `featureImportances` in model loaded from saved ones.
    
    The PR changes the `impurity` param used to one which allows only the value `variance`.
    
    ## How was this patch tested?
    
    modified UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-25959

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22986.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22986
    
----
commit 9babef5732b8482053cc71753aca5c7bd3fb91a6
Author: Marco Gaido <ma...@...>
Date:   2018-11-08T15:15:15Z

    [SPARK-25959][ML] GBTClassifier picks wrong impurity stats on loading

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r232447655
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
     
       /**
        * Loss function which GBT tries to minimize. (case-insensitive)
    --- End diff --
    
    Yes, that's right. Currently we just saying that we are using `gini`, while we are using `variance`...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r231997151
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    Yes, by the names seems so, but if you check their implementation, they both just add a `impurity` param with the only difference of the allowed values. In the case of `TreeRegressorParams` the only allowed value is `variance` (which is what `GBTClassifier` uses), in `TreeClassifierParams` the allowed values are `gini` (default) and `entropy`. So `variance` is not a valid value for the `impurity` param of `TreeClassifierParams`, so there is no way to set it properly for the `GBTClassifier`....


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r232446160
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    To continue to extend `TreeClassifierParams` but be able to use correct `impurity`, seems we need to remove `final` from `impurity` in trait `TreeClassifierParams`...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r234407043
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams {
     private[ml] trait DecisionTreeClassifierParams
       extends DecisionTreeParams with TreeClassifierParams
     
    -/**
    - * Parameters for Decision Tree-based regression algorithms.
    - */
    -private[ml] trait TreeRegressorParams extends Params {
    --- End diff --
    
    yes, it is AFAIK. I think it is a false positive.
    
    I wouldn't remove it in this change because since this is a bug, this PR can be backported to 2.4.1 too. I'd do it in a separate change. We would need anyway to add the MiMa rules for the missing methods, so it doesn't change much on the exclusions either...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98607/testReport)** for PR 22986 at commit [`9babef5`](https://github.com/apache/spark/commit/9babef5732b8482053cc71753aca5c7bd3fb91a6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98780/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r234405667
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams {
     private[ml] trait DecisionTreeClassifierParams
       extends DecisionTreeParams with TreeClassifierParams
     
    -/**
    - * Parameters for Decision Tree-based regression algorithms.
    - */
    -private[ml] trait TreeRegressorParams extends Params {
    --- End diff --
    
    Just looking at this one more time ... hm, isn't `this.type` always referring to the current class's type? maybe it's some detail of the generated code and this is essentially a false positive. Or, honestly setImpurity can just be removed in this change too. It's deprecated for all of these classes. I don't mind doing that later too; would just avoid these MiMa exclusions too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4989/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233101747
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    That's preferable to extending the 'wrong' trait. Hm, can TreeClassifierParams not just allow all impurity values? and have GBTClassifierParams set it to "variance" in the constructor? that sounds more correct. I know you guys have looked into it more but I still wasn't clear why this wasn't an option.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98607/testReport)** for PR 22986 at commit [`9babef5`](https://github.com/apache/spark/commit/9babef5732b8482053cc71753aca5c7bd3fb91a6).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r234412163
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams {
     private[ml] trait DecisionTreeClassifierParams
       extends DecisionTreeParams with TreeClassifierParams
     
    -/**
    - * Parameters for Decision Tree-based regression algorithms.
    - */
    -private[ml] trait TreeRegressorParams extends Params {
    --- End diff --
    
    That's true, but I don't know if we can back-port it because of the binary incompatibility, internal as it may be. I don't know. If it's not an issue then yes it can back port


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r231947668
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -36,6 +36,8 @@ object MimaExcludes {
     
       // Exclude rules for 3.0.x
       lazy val v30excludes = v24excludes ++ Seq(
    +    // [SPARK-25959] GBTClassifier picks wrong impurity stats on loading
    +    ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.classification.GBTClassificationModel.setImpurity"),
    --- End diff --
    
    a note for reviewers: setting this parameter on the model doesn't make any sense. We may also override the `setImpurity` method in order to throw an exception as done in the `GBTClassifier`...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233115369
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    @srowen the problem is the `setDefault` which is performed in the `TreeClassifierParams` if I do what you are suggesting, then you get an exception because `gini` is not a valid value for it. Unless we put as allowed values `gini` and `variance` and we set it back to `variance` in the GBTClassifier but that is way too hacky IMHO. I'll go ahead with adding the `impurity` directly in `GBTClassifierParams` then, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98610/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98780/testReport)** for PR 22986 at commit [`176b0d8`](https://github.com/apache/spark/commit/176b0d84b151a8ff7545fba89505c43b04699598).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5019/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98819/testReport)** for PR 22986 at commit [`4aefc9f`](https://github.com/apache/spark/commit/4aefc9f9d561f1c5f2c68442681cd0f8cf4dea62).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4995/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4853/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98819/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98609/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r232445980
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
     
       /**
        * Loss function which GBT tries to minimize. (case-insensitive)
    --- End diff --
    
    Another issue is that `GBTClassifier` also logs wrong value of `impurity` param.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98780/testReport)** for PR 22986 at commit [`176b0d8`](https://github.com/apache/spark/commit/176b0d84b151a8ff7545fba89505c43b04699598).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class MulticlassMetrics @Since(\"1.1.0\") (predAndLabelsWithOptWeight: RDD[_ <: Product]) `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4856/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233232112
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    I looked more into this and see that setImpurity is deprecated. I presume the point is to use set(impurity, ...) instead. Yeah, that's no longer possible to override in subclasses, scratch that; overriding setImpurity would have been just fine IMHO but that's not going to last anyway.
    
    I do agree this is why there are different traits for classifiers and regressors, but I don't think that means a classifier should extend TreeRegressorParams because its parameters happen to match.
    
    One option is to let the definition of `impurity` itself be overridden. That seems OK. Or we could make a new 'VarianceClassifier' or something that defines this variance-only impurity parameter and let TreeRegressionParams and GBTClassifierParams extend it. It is a little wacky, but quite reasonable from an OOP perspective. I think.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98819/testReport)** for PR 22986 at commit [`4aefc9f`](https://github.com/apache/spark/commit/4aefc9f9d561f1c5f2c68442681cd0f8cf4dea62).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98607/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233514040
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams {
     private[ml] trait DecisionTreeClassifierParams
       extends DecisionTreeParams with TreeClassifierParams
     
    -/**
    - * Parameters for Decision Tree-based regression algorithms.
    - */
    -private[ml] trait TreeRegressorParams extends Params {
    --- End diff --
    
    actually this trait is still there (just some lines below). The MiMa warnings are there because the returned type of `setImpurity` returns `this.type`, which is now `HasVarianceImpurity` and the same for the setter...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98609/testReport)** for PR 22986 at commit [`7d24f33`](https://github.com/apache/spark/commit/7d24f33d4d802096b06594e8bf643ed5b4546613).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r234412241
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams {
     private[ml] trait DecisionTreeClassifierParams
       extends DecisionTreeParams with TreeClassifierParams
     
    -/**
    - * Parameters for Decision Tree-based regression algorithms.
    - */
    -private[ml] trait TreeRegressorParams extends Params {
    --- End diff --
    
    I see. I am not sure how to verify that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98610/testReport)** for PR 22986 at commit [`c2be400`](https://github.com/apache/spark/commit/c2be400bc7f7a9708a2b34e107ef036212b541ad).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r231976732
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    Why does it need regressor params to fix this? it sounds like it should extend TreeClassifierParams, just given the name


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98610/testReport)** for PR 22986 at commit [`c2be400`](https://github.com/apache/spark/commit/c2be400bc7f7a9708a2b34e107ef036212b541ad).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98785/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r232447366
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    I will try a small refactor creating a `HasImpurity` trait with the allowed values being overridden by its descendant. This way it should work. Let me know what you think about it. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233056300
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    @srowen I think the only alternative here us to avoid extending either of them and define the `impurity` param directly into this trait. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98785/testReport)** for PR 22986 at commit [`6b91211`](https://github.com/apache/spark/commit/6b9121197ca2490c9a6a6d3f23f1ea4e2a19d99f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233126920
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    Because in GBTClassifier the only valid value is `variance`. It is true that the `setImpurity` method throws an exception if someone tries to set it, but we are not safe against bad values as it happens now. And, most importantly `TreeClassifierParams` is extended by other classes, so we would allow the `variance` value to be set for RandomForest and DecisionTree, which is bad because it is not a valid value for them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233356794
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    @srowen yes, I'll try and create a `VarianceClassifier` trait. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4855/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233138080
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    Yes, other classes have to restrict what's settable. I don't see the issue with bad values here? Is it serialized forms? At least this sounds like the normal OOP way to solve this. Other ways around this sound like hacks. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    **[Test build #98785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98785/testReport)** for PR 22986 at commit [`6b91211`](https://github.com/apache/spark/commit/6b9121197ca2490c9a6a6d3f23f1ea4e2a19d99f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233122017
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    Wait, why can't all three values be valid? and set the default to gini in the parent, and set the actual value to variance in the base class? I am still not sure why this isn't possible and that on the contrary sounds like the most natural solution.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233506625
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams {
     private[ml] trait DecisionTreeClassifierParams
       extends DecisionTreeParams with TreeClassifierParams
     
    -/**
    - * Parameters for Decision Tree-based regression algorithms.
    - */
    -private[ml] trait TreeRegressorParams extends Params {
    --- End diff --
    
    You can actually keep this trait and extend HasVarianceImpurity. It would be a no-op like a few other traits here. Although 'redundant', would it avoid any MiMa warnings or very slightly improve compatibility, even if this is private[spark]?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r232003571
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    Surely that's not the ideal way to fix it though, semantically. Can't GBTClassifier override the impurity internally (or can we let it do so)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22986


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by surajnayak <gi...@git.apache.org>.
Github user surajnayak commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Tried this patch. Seems this resolves the issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233153885
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with TreeRegressorParams {
    --- End diff --
    
    I mean restricting in the `setImpurity` method as you are suggesting is probably feasible, but I am not sure it is the right thing to do. I haven't seen that pattern anywhere else in the codebase. We set the allowed parameters in the Param itself usually. One of the reason is to avoid problems when you are setting the default for instance or when `set` is used (this happens for instance when loading a saved model, so we can load an invalid model): so I think the solution is not really safe.
    
    I am not sure the solution you are proposing is cleaner. Also, before this PR there were 2 interfaces: `TreeClassifierParams` and `TreeRegressorParams` only for this reason, ie. for the impurity param to allow/have different values. We could switch to a single `HasImpurity` with your approach.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22986
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org