You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/05/30 23:59:20 UTC

[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

GitHub user huaxingao opened a pull request:

    https://github.com/apache/spark/pull/21465

    [SPARK-24333][ML][PYTHON]Add fit with validation set to spark.ml GBT: Python API

    
    ## What changes were proposed in this pull request?
    
    Add validationIndicatorCol and validationTol to GBT Python.
    
    ## How was this patch tested?
    
    Add test in doctest to test the new API.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huaxingao/spark spark-24333

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21465.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21465
    
----
commit 79fc83bae376c430a23fd7c9f502690a1c4d321e
Author: Huaxin Gao <hu...@...>
Date:   2018-05-30T23:49:31Z

    [SPARK-24333][ML][PYTHON]Add fit with validation set to spark.ml GBT: Python API

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2987/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r238809091
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -650,19 +650,20 @@ def getFeatureSubsetStrategy(self):
             return self.getOrDefault(self.featureSubsetStrategy)
     
     
    -class TreeRegressorParams(Params):
    +class HasVarianceImpurity(Params):
    --- End diff --
    
    This shouldn't be changed, impurity is different for regression and classification, so the param needs to be defined in `TreeRegressorParams` and `TreeClassifierParams`, as it was already


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r236880776
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -705,12 +705,38 @@ def getNumTrees(self):
             return self.getOrDefault(self.numTrees)
     
     
    -class GBTParams(TreeEnsembleParams):
    +class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):
    --- End diff --
    
    I like having a common `GBTParams` class, it was strange to have 2 of the same name. But you should also define `GBTClassifierParams` and `GBTRegressorParams`, then put the `supportedLossTypes` in there so you don't need to override them later. You can also put the `lossType` param and `getLossType()` method there.  This makes it clean and follows how it's done in Scala.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5784/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239211515
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1174,9 +1165,31 @@ def trees(self):
             return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
     
     
    +class GBTClassifierParams(GBTParams, HasVarianceImpurity):
    --- End diff --
    
    ah, I see. let me take another look..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99413/testReport)** for PR 21465 at commit [`c0586bd`](https://github.com/apache/spark/commit/c0586bdc98277e43a5d902b6d3fdfe97579d3821).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3867/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239243683
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1174,9 +1165,31 @@ def trees(self):
             return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
     
     
    +class GBTClassifierParams(GBTParams, HasVarianceImpurity):
    +    """
    +    Private class to track supported GBTClassifier params.
    +
    +    .. versionadded:: 3.0.0
    +    """
    +
    +    supportedLossTypes = ["logistic"]
    +
    +    lossType = Param(Params._dummy(), "lossType",
    +                     "Loss function which GBT tries to minimize (case-insensitive). " +
    +                     "Supported options: " + ", ".join(supportedLossTypes),
    +                     typeConverter=TypeConverters.toString)
    +
    +    @since("3.0.0")
    +    def setLossType(self, value):
    --- End diff --
    
    please address the above comment


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #91317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91317/testReport)** for PR 21465 at commit [`79fc83b`](https://github.com/apache/spark/commit/79fc83bae376c430a23fd7c9f502690a1c4d321e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasValidationIndicatorCol(Params):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99136/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99136/testReport)** for PR 21465 at commit [`88ff888`](https://github.com/apache/spark/commit/88ff8885301d16d5be49b1a1adb31634c43fb64f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21465


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5485/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #91798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91798/testReport)** for PR 21465 at commit [`4290b58`](https://github.com/apache/spark/commit/4290b58b53f7f7b54715dec311b8ac772de9afd7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99013/testReport)** for PR 21465 at commit [`6e177a3`](https://github.com/apache/spark/commit/6e177a34b6a9710167dd3f4df4be0a585365ad7c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91798/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #98751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98751/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99136/testReport)** for PR 21465 at commit [`88ff888`](https://github.com/apache/spark/commit/88ff8885301d16d5be49b1a1adb31634c43fb64f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, GBTParams,`
      * `class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):`
      * `class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, GBTParams,`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    @BryanCutler Thank you very much for your help!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95893/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99838/testReport)** for PR 21465 at commit [`6fc95a7`](https://github.com/apache/spark/commit/6fc95a77e0c541b25cc7acba434b263a8e378926).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r238801256
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1174,9 +1165,31 @@ def trees(self):
             return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
     
     
    +class GBTClassifierParams(GBTParams, HasVarianceImpurity):
    +    """
    +    Private class to track supported GBTClassifier params.
    +
    +    .. versionadded:: 3.0.0
    +    """
    +
    +    supportedLossTypes = ["logistic"]
    +
    +    lossType = Param(Params._dummy(), "lossType",
    +                     "Loss function which GBT tries to minimize (case-insensitive). " +
    +                     "Supported options: " + ", ".join(supportedLossTypes),
    +                     typeConverter=TypeConverters.toString)
    +
    +    @since("3.0.0")
    --- End diff --
    
    don't change the version, since we are just refactoring the base classes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99408/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #95893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95893/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5480/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5862/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r238801573
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1174,9 +1165,31 @@ def trees(self):
             return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
     
     
    +class GBTClassifierParams(GBTParams, HasVarianceImpurity):
    +    """
    +    Private class to track supported GBTClassifier params.
    +
    +    .. versionadded:: 3.0.0
    +    """
    +
    +    supportedLossTypes = ["logistic"]
    +
    +    lossType = Param(Params._dummy(), "lossType",
    +                     "Loss function which GBT tries to minimize (case-insensitive). " +
    +                     "Supported options: " + ", ".join(supportedLossTypes),
    +                     typeConverter=TypeConverters.toString)
    +
    +    @since("3.0.0")
    +    def setLossType(self, value):
    --- End diff --
    
    `setLossType` should be in the estimators, `getLossType` should be here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98751/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #91586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91586/testReport)** for PR 21465 at commit [`4290b58`](https://github.com/apache/spark/commit/4290b58b53f7f7b54715dec311b8ac772de9afd7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239245388
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -705,12 +710,59 @@ def getNumTrees(self):
             return self.getOrDefault(self.numTrees)
     
     
    -class GBTParams(TreeEnsembleParams):
    +class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):
         """
         Private class to track supported GBT params.
         """
    +
    +    stepSize = Param(Params._dummy(), "stepSize",
    +                     "Step size (a.k.a. learning rate) in interval (0, 1] for shrinking " +
    +                     "the contribution of each estimator.",
    +                     typeConverter=TypeConverters.toFloat)
    +
    +    validationTol = Param(Params._dummy(), "validationTol",
    +                          "Threshold for stopping early when fit with validation is used. " +
    +                          "If the error rate on the validation input changes by less than the " +
    +                          "validationTol, then learning will stop early (before `maxIter`). " +
    +                          "This parameter is ignored when fit without validation is used.",
    +                          typeConverter=TypeConverters.toFloat)
    +
    +    @since("3.0.0")
    +    def setValidationTol(self, value):
    +        """
    +        Sets the value of :py:attr:`validationTol`.
    +        """
    +        return self._set(validationTol=value)
    +
    +    @since("3.0.0")
    +    def getValidationTol(self):
    +        """
    +        Gets the value of validationTol or its default value.
    +        """
    +        return self.getOrDefault(self.validationTol)
    +
    +
    +class GBTRegressorParams(GBTParams, TreeRegressorParams):
    +    """
    +    Private class to track supported GBTRegressor params.
    +
    +    .. versionadded:: 3.0.0
    +    """
    +
         supportedLossTypes = ["squared", "absolute"]
     
    +    lossType = Param(Params._dummy(), "lossType",
    +                     "Loss function which GBT tries to minimize (case-insensitive). " +
    +                     "Supported options: " + ", ".join(supportedLossTypes),
    +                     typeConverter=TypeConverters.toString)
    +
    +    @since("1.4.0")
    +    def setLossType(self, value):
    --- End diff --
    
    `setLossType` should be in the estimator and `getLossType` should be here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r194143768
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1251,26 +1256,33 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol
                          "the contribution of each estimator.",
                          typeConverter=TypeConverters.toFloat)
     
    +    validationTol = Param(Params._dummy(), "validationTol",
    +                          "Threshold for stopping early when fit with validation is used. " +
    +                          "If the error rate on the validation input changes by less than the " +
    +                          "validationTol, then learning will stop early (before `maxIter`). " +
    +                          "This parameter is ignored when fit without validation is used.",
    +                          typeConverter=TypeConverters.toFloat)
    +
         @keyword_only
         def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction",
                      maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0,
                      maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, lossType="logistic",
                      maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0,
    -                 featureSubsetStrategy="all"):
    +                 featureSubsetStrategy="all", validationTol=0.01):
    --- End diff --
    
    Shouldn't `validationIndicatorCol` be in `init` too? Set to None default?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99838/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91586/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99744/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99413/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239242316
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1174,9 +1165,31 @@ def trees(self):
             return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
     
     
    +class GBTClassifierParams(GBTParams, HasVarianceImpurity):
    --- End diff --
    
    Yeah, you're correct, this is fine


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #91586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91586/testReport)** for PR 21465 at commit [`4290b58`](https://github.com/apache/spark/commit/4290b58b53f7f7b54715dec311b8ac772de9afd7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99838/testReport)** for PR 21465 at commit [`6fc95a7`](https://github.com/apache/spark/commit/6fc95a77e0c541b25cc7acba434b263a8e378926).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasDistanceMeasure(Params):`
      * `class HasValidationIndicatorCol(Params):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239240113
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -705,12 +710,59 @@ def getNumTrees(self):
             return self.getOrDefault(self.numTrees)
     
     
    -class GBTParams(TreeEnsembleParams):
    +class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):
         """
         Private class to track supported GBT params.
         """
    +
    +    stepSize = Param(Params._dummy(), "stepSize",
    +                     "Step size (a.k.a. learning rate) in interval (0, 1] for shrinking " +
    +                     "the contribution of each estimator.",
    +                     typeConverter=TypeConverters.toFloat)
    +
    +    validationTol = Param(Params._dummy(), "validationTol",
    +                          "Threshold for stopping early when fit with validation is used. " +
    +                          "If the error rate on the validation input changes by less than the " +
    +                          "validationTol, then learning will stop early (before `maxIter`). " +
    +                          "This parameter is ignored when fit without validation is used.",
    +                          typeConverter=TypeConverters.toFloat)
    +
    +    @since("3.0.0")
    +    def setValidationTol(self, value):
    --- End diff --
    
    It seems scala does not have this API right? If not then let's remove it here for now


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r236881042
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -1030,9 +1056,9 @@ def featureImportances(self):
     
     
     @inherit_doc
    -class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    -                   GBTParams, HasCheckpointInterval, HasStepSize, HasSeed, JavaMLWritable,
    -                   JavaMLReadable, TreeRegressorParams):
    +class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, GBTParams,
    +                   HasCheckpointInterval, HasStepSize, HasSeed, JavaMLWritable, JavaMLReadable,
    --- End diff --
    
    I think you can remove `HasStepSize` since it is in `GBTParams`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r194198390
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1251,26 +1256,33 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol
                          "the contribution of each estimator.",
                          typeConverter=TypeConverters.toFloat)
     
    +    validationTol = Param(Params._dummy(), "validationTol",
    +                          "Threshold for stopping early when fit with validation is used. " +
    +                          "If the error rate on the validation input changes by less than the " +
    +                          "validationTol, then learning will stop early (before `maxIter`). " +
    +                          "This parameter is ignored when fit without validation is used.",
    +                          typeConverter=TypeConverters.toFloat)
    +
         @keyword_only
         def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction",
                      maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0,
                      maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, lossType="logistic",
                      maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0,
    -                 featureSubsetStrategy="all"):
    +                 featureSubsetStrategy="all", validationTol=0.01):
    --- End diff --
    
    @MLnick Yes, I should add it in init. Will change it now. Thanks a lot for your review!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r238808440
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1174,9 +1165,31 @@ def trees(self):
             return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
     
     
    +class GBTClassifierParams(GBTParams, HasVarianceImpurity):
    --- End diff --
    
    this should extend `TreeClassifierParams`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r235128413
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1176,8 +1176,8 @@ def trees(self):
     
     @inherit_doc
     class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    -                    GBTParams, HasCheckpointInterval, HasStepSize, HasSeed, JavaMLWritable,
    -                    JavaMLReadable):
    +                    GBTParams, HasCheckpointInterval, HasStepSize, HasSeed,
    +                    HasValidationIndicatorCol, JavaMLWritable, JavaMLReadable):
    --- End diff --
    
    I think this should be added to `GBTParams`, which is done on the Scala side too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239904265
  
    --- Diff: python/pyspark/ml/param/shared.py ---
    @@ -814,3 +814,25 @@ def getDistanceMeasure(self):
             """
             return self.getOrDefault(self.distanceMeasure)
     
    +
    +class HasValidationIndicatorCol(Params):
    --- End diff --
    
    You are right. DecisionTreeParams should be at the bottom. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5243/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99013/testReport)** for PR 21465 at commit [`6e177a3`](https://github.com/apache/spark/commit/6e177a34b6a9710167dd3f4df4be0a585365ad7c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #95893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95893/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99744/testReport)** for PR 21465 at commit [`30a743d`](https://github.com/apache/spark/commit/30a743d79ecd2b18b5d1fa997d63dee914b714a0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99408/testReport)** for PR 21465 at commit [`c0fcbb3`](https://github.com/apache/spark/commit/c0fcbb397ae3c954092e461bbabf2b8f8cf85386).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #91798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91798/testReport)** for PR 21465 at commit [`4290b58`](https://github.com/apache/spark/commit/4290b58b53f7f7b54715dec311b8ac772de9afd7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r235488017
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1176,8 +1176,8 @@ def trees(self):
     
     @inherit_doc
     class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    -                    GBTParams, HasCheckpointInterval, HasStepSize, HasSeed, JavaMLWritable,
    -                    JavaMLReadable):
    +                    GBTParams, HasCheckpointInterval, HasStepSize, HasSeed,
    +                    HasValidationIndicatorCol, JavaMLWritable, JavaMLReadable):
    --- End diff --
    
    @BryanCutler Thank you very much for reviewing my PR. I moved HasValidationIndicatorCol, HasMaxIter and HasStepSize to GBTParams. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #98751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98751/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3712/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99013/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239173098
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1174,9 +1165,31 @@ def trees(self):
             return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
     
     
    +class GBTClassifierParams(GBTParams, HasVarianceImpurity):
    --- End diff --
    
    @BryanCutler Thanks for your review. 
    Seems recently https://github.com/apache/spark/pull/22986 added ```trait HasVarianceImpurity``` and made 
    ```private[ml] trait GBTClassifierParams extends GBTParams with HasVarianceImpurity```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r238809338
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1242,40 +1255,36 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol
         [0.25..., 0.23..., 0.21..., 0.19..., 0.18...]
         >>> model.numClasses
         2
    +    >>> gbt = gbt.setValidationIndicatorCol("validationIndicator")
    +    >>> gbt.getValidationIndicatorCol()
    +    'validationIndicator'
    +    >>> gbt.getValidationTol()
    +    0.01
     
         .. versionadded:: 1.4.0
         """
     
    -    lossType = Param(Params._dummy(), "lossType",
    -                     "Loss function which GBT tries to minimize (case-insensitive). " +
    -                     "Supported options: " + ", ".join(GBTParams.supportedLossTypes),
    -                     typeConverter=TypeConverters.toString)
    -
    -    stepSize = Param(Params._dummy(), "stepSize",
    -                     "Step size (a.k.a. learning rate) in interval (0, 1] for shrinking " +
    -                     "the contribution of each estimator.",
    -                     typeConverter=TypeConverters.toFloat)
    -
         @keyword_only
         def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction",
                      maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0,
                      maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, lossType="logistic",
    -                 maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0,
    -                 featureSubsetStrategy="all"):
    +                 maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0, impurity="variance",
    --- End diff --
    
    this is not the correct default impurity


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #91317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91317/testReport)** for PR 21465 at commit [`79fc83b`](https://github.com/apache/spark/commit/79fc83bae376c430a23fd7c9f502690a1c4d321e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239243661
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -1174,9 +1165,31 @@ def trees(self):
             return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
     
     
    +class GBTClassifierParams(GBTParams, HasVarianceImpurity):
    +    """
    +    Private class to track supported GBTClassifier params.
    +
    +    .. versionadded:: 3.0.0
    +    """
    +
    +    supportedLossTypes = ["logistic"]
    +
    +    lossType = Param(Params._dummy(), "lossType",
    +                     "Loss function which GBT tries to minimize (case-insensitive). " +
    +                     "Supported options: " + ", ".join(supportedLossTypes),
    +                     typeConverter=TypeConverters.toString)
    +
    +    @since("3.0.0")
    --- End diff --
    
    please address the above comment


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99744/testReport)** for PR 21465 at commit [`30a743d`](https://github.com/apache/spark/commit/30a743d79ecd2b18b5d1fa997d63dee914b714a0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91317/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21465#discussion_r239569077
  
    --- Diff: python/pyspark/ml/param/shared.py ---
    @@ -814,3 +814,25 @@ def getDistanceMeasure(self):
             """
             return self.getOrDefault(self.distanceMeasure)
     
    +
    +class HasValidationIndicatorCol(Params):
    --- End diff --
    
    Would you mind running the codegen again, like this command for example
    `pushd python/pyspark/ml/param/ && python _shared_params_code_gen.py > shared.py && popd` and push the result if there is a diff?  I think the DecisionTreeParams should be at the bottom of the file..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    merged to master, thanks @huaxingao !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    @BryanCutler Thank you very much for your review! I will submit changes soon. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21465
  
    **[Test build #99408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99408/testReport)** for PR 21465 at commit [`c0fcbb3`](https://github.com/apache/spark/commit/c0fcbb397ae3c954092e461bbabf2b8f8cf85386).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class GBTClassifierParams(GBTParams, HasVarianceImpurity):`
      * `class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,`
      * `class HasVarianceImpurity(Params):`
      * `class TreeRegressorParams(HasVarianceImpurity):`
      * `class GBTRegressorParams(GBTParams, TreeRegressorParams):`
      * `class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org