You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/05/30 23:59:20 UTC
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
GitHub user huaxingao opened a pull request:
https://github.com/apache/spark/pull/21465
[SPARK-24333][ML][PYTHON]Add fit with validation set to spark.ml GBT: Python API
## What changes were proposed in this pull request?
Add validationIndicatorCol and validationTol to GBT Python.
## How was this patch tested?
Add test in doctest to test the new API.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/huaxingao/spark spark-24333
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21465.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21465
----
commit 79fc83bae376c430a23fd7c9f502690a1c4d321e
Author: Huaxin Gao <hu...@...>
Date: 2018-05-30T23:49:31Z
[SPARK-24333][ML][PYTHON]Add fit with validation set to spark.ml GBT: Python API
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2987/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238809091
--- Diff: python/pyspark/ml/regression.py ---
@@ -650,19 +650,20 @@ def getFeatureSubsetStrategy(self):
return self.getOrDefault(self.featureSubsetStrategy)
-class TreeRegressorParams(Params):
+class HasVarianceImpurity(Params):
--- End diff --
This shouldn't be changed, impurity is different for regression and classification, so the param needs to be defined in `TreeRegressorParams` and `TreeClassifierParams`, as it was already
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r236880776
--- Diff: python/pyspark/ml/regression.py ---
@@ -705,12 +705,38 @@ def getNumTrees(self):
return self.getOrDefault(self.numTrees)
-class GBTParams(TreeEnsembleParams):
+class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):
--- End diff --
I like having a common `GBTParams` class, it was strange to have 2 of the same name. But you should also define `GBTClassifierParams` and `GBTRegressorParams`, then put the `supportedLossTypes` in there so you don't need to override them later. You can also put the `lossType` param and `getLossType()` method there. This makes it clean and follows how it's done in Scala.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5784/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239211515
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
--- End diff --
ah, I see. let me take another look..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99413/testReport)** for PR 21465 at commit [`c0586bd`](https://github.com/apache/spark/commit/c0586bdc98277e43a5d902b6d3fdfe97579d3821).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3867/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239243683
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
+ """
+ Private class to track supported GBTClassifier params.
+
+ .. versionadded:: 3.0.0
+ """
+
+ supportedLossTypes = ["logistic"]
+
+ lossType = Param(Params._dummy(), "lossType",
+ "Loss function which GBT tries to minimize (case-insensitive). " +
+ "Supported options: " + ", ".join(supportedLossTypes),
+ typeConverter=TypeConverters.toString)
+
+ @since("3.0.0")
+ def setLossType(self, value):
--- End diff --
please address the above comment
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #91317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91317/testReport)** for PR 21465 at commit [`79fc83b`](https://github.com/apache/spark/commit/79fc83bae376c430a23fd7c9f502690a1c4d321e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class HasValidationIndicatorCol(Params):`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99136/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99136/testReport)** for PR 21465 at commit [`88ff888`](https://github.com/apache/spark/commit/88ff8885301d16d5be49b1a1adb31634c43fb64f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21465
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5485/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #91798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91798/testReport)** for PR 21465 at commit [`4290b58`](https://github.com/apache/spark/commit/4290b58b53f7f7b54715dec311b8ac772de9afd7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99013/testReport)** for PR 21465 at commit [`6e177a3`](https://github.com/apache/spark/commit/6e177a34b6a9710167dd3f4df4be0a585365ad7c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91798/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #98751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98751/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99136/testReport)** for PR 21465 at commit [`88ff888`](https://github.com/apache/spark/commit/88ff8885301d16d5be49b1a1adb31634c43fb64f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, GBTParams,`
* `class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):`
* `class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, GBTParams,`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:
https://github.com/apache/spark/pull/21465
@BryanCutler Thank you very much for your help!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95893/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99838/testReport)** for PR 21465 at commit [`6fc95a7`](https://github.com/apache/spark/commit/6fc95a77e0c541b25cc7acba434b263a8e378926).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238801256
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
+ """
+ Private class to track supported GBTClassifier params.
+
+ .. versionadded:: 3.0.0
+ """
+
+ supportedLossTypes = ["logistic"]
+
+ lossType = Param(Params._dummy(), "lossType",
+ "Loss function which GBT tries to minimize (case-insensitive). " +
+ "Supported options: " + ", ".join(supportedLossTypes),
+ typeConverter=TypeConverters.toString)
+
+ @since("3.0.0")
--- End diff --
don't change the version, since we are just refactoring the base classes
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99408/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #95893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95893/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5480/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5862/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238801573
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
+ """
+ Private class to track supported GBTClassifier params.
+
+ .. versionadded:: 3.0.0
+ """
+
+ supportedLossTypes = ["logistic"]
+
+ lossType = Param(Params._dummy(), "lossType",
+ "Loss function which GBT tries to minimize (case-insensitive). " +
+ "Supported options: " + ", ".join(supportedLossTypes),
+ typeConverter=TypeConverters.toString)
+
+ @since("3.0.0")
+ def setLossType(self, value):
--- End diff --
`setLossType` should be in the estimators, `getLossType` should be here
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98751/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #91586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91586/testReport)** for PR 21465 at commit [`4290b58`](https://github.com/apache/spark/commit/4290b58b53f7f7b54715dec311b8ac772de9afd7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239245388
--- Diff: python/pyspark/ml/regression.py ---
@@ -705,12 +710,59 @@ def getNumTrees(self):
return self.getOrDefault(self.numTrees)
-class GBTParams(TreeEnsembleParams):
+class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):
"""
Private class to track supported GBT params.
"""
+
+ stepSize = Param(Params._dummy(), "stepSize",
+ "Step size (a.k.a. learning rate) in interval (0, 1] for shrinking " +
+ "the contribution of each estimator.",
+ typeConverter=TypeConverters.toFloat)
+
+ validationTol = Param(Params._dummy(), "validationTol",
+ "Threshold for stopping early when fit with validation is used. " +
+ "If the error rate on the validation input changes by less than the " +
+ "validationTol, then learning will stop early (before `maxIter`). " +
+ "This parameter is ignored when fit without validation is used.",
+ typeConverter=TypeConverters.toFloat)
+
+ @since("3.0.0")
+ def setValidationTol(self, value):
+ """
+ Sets the value of :py:attr:`validationTol`.
+ """
+ return self._set(validationTol=value)
+
+ @since("3.0.0")
+ def getValidationTol(self):
+ """
+ Gets the value of validationTol or its default value.
+ """
+ return self.getOrDefault(self.validationTol)
+
+
+class GBTRegressorParams(GBTParams, TreeRegressorParams):
+ """
+ Private class to track supported GBTRegressor params.
+
+ .. versionadded:: 3.0.0
+ """
+
supportedLossTypes = ["squared", "absolute"]
+ lossType = Param(Params._dummy(), "lossType",
+ "Loss function which GBT tries to minimize (case-insensitive). " +
+ "Supported options: " + ", ".join(supportedLossTypes),
+ typeConverter=TypeConverters.toString)
+
+ @since("1.4.0")
+ def setLossType(self, value):
--- End diff --
`setLossType` should be in the estimator and `getLossType` should be here
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r194143768
--- Diff: python/pyspark/ml/classification.py ---
@@ -1251,26 +1256,33 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol
"the contribution of each estimator.",
typeConverter=TypeConverters.toFloat)
+ validationTol = Param(Params._dummy(), "validationTol",
+ "Threshold for stopping early when fit with validation is used. " +
+ "If the error rate on the validation input changes by less than the " +
+ "validationTol, then learning will stop early (before `maxIter`). " +
+ "This parameter is ignored when fit without validation is used.",
+ typeConverter=TypeConverters.toFloat)
+
@keyword_only
def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction",
maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0,
maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, lossType="logistic",
maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0,
- featureSubsetStrategy="all"):
+ featureSubsetStrategy="all", validationTol=0.01):
--- End diff --
Shouldn't `validationIndicatorCol` be in `init` too? Set to None default?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99838/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91586/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99744/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99413/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239242316
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
--- End diff --
Yeah, you're correct, this is fine
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #91586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91586/testReport)** for PR 21465 at commit [`4290b58`](https://github.com/apache/spark/commit/4290b58b53f7f7b54715dec311b8ac772de9afd7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99838/testReport)** for PR 21465 at commit [`6fc95a7`](https://github.com/apache/spark/commit/6fc95a77e0c541b25cc7acba434b263a8e378926).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class HasDistanceMeasure(Params):`
* `class HasValidationIndicatorCol(Params):`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239240113
--- Diff: python/pyspark/ml/regression.py ---
@@ -705,12 +710,59 @@ def getNumTrees(self):
return self.getOrDefault(self.numTrees)
-class GBTParams(TreeEnsembleParams):
+class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, HasValidationIndicatorCol):
"""
Private class to track supported GBT params.
"""
+
+ stepSize = Param(Params._dummy(), "stepSize",
+ "Step size (a.k.a. learning rate) in interval (0, 1] for shrinking " +
+ "the contribution of each estimator.",
+ typeConverter=TypeConverters.toFloat)
+
+ validationTol = Param(Params._dummy(), "validationTol",
+ "Threshold for stopping early when fit with validation is used. " +
+ "If the error rate on the validation input changes by less than the " +
+ "validationTol, then learning will stop early (before `maxIter`). " +
+ "This parameter is ignored when fit without validation is used.",
+ typeConverter=TypeConverters.toFloat)
+
+ @since("3.0.0")
+ def setValidationTol(self, value):
--- End diff --
It seems scala does not have this API right? If not then let's remove it here for now
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r236881042
--- Diff: python/pyspark/ml/regression.py ---
@@ -1030,9 +1056,9 @@ def featureImportances(self):
@inherit_doc
-class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
- GBTParams, HasCheckpointInterval, HasStepSize, HasSeed, JavaMLWritable,
- JavaMLReadable, TreeRegressorParams):
+class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, GBTParams,
+ HasCheckpointInterval, HasStepSize, HasSeed, JavaMLWritable, JavaMLReadable,
--- End diff --
I think you can remove `HasStepSize` since it is in `GBTParams`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r194198390
--- Diff: python/pyspark/ml/classification.py ---
@@ -1251,26 +1256,33 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol
"the contribution of each estimator.",
typeConverter=TypeConverters.toFloat)
+ validationTol = Param(Params._dummy(), "validationTol",
+ "Threshold for stopping early when fit with validation is used. " +
+ "If the error rate on the validation input changes by less than the " +
+ "validationTol, then learning will stop early (before `maxIter`). " +
+ "This parameter is ignored when fit without validation is used.",
+ typeConverter=TypeConverters.toFloat)
+
@keyword_only
def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction",
maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0,
maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, lossType="logistic",
maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0,
- featureSubsetStrategy="all"):
+ featureSubsetStrategy="all", validationTol=0.01):
--- End diff --
@MLnick Yes, I should add it in init. Will change it now. Thanks a lot for your review!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238808440
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
--- End diff --
this should extend `TreeClassifierParams`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r235128413
--- Diff: python/pyspark/ml/classification.py ---
@@ -1176,8 +1176,8 @@ def trees(self):
@inherit_doc
class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
- GBTParams, HasCheckpointInterval, HasStepSize, HasSeed, JavaMLWritable,
- JavaMLReadable):
+ GBTParams, HasCheckpointInterval, HasStepSize, HasSeed,
+ HasValidationIndicatorCol, JavaMLWritable, JavaMLReadable):
--- End diff --
I think this should be added to `GBTParams`, which is done on the Scala side too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239904265
--- Diff: python/pyspark/ml/param/shared.py ---
@@ -814,3 +814,25 @@ def getDistanceMeasure(self):
"""
return self.getOrDefault(self.distanceMeasure)
+
+class HasValidationIndicatorCol(Params):
--- End diff --
You are right. DecisionTreeParams should be at the bottom.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5243/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99013/testReport)** for PR 21465 at commit [`6e177a3`](https://github.com/apache/spark/commit/6e177a34b6a9710167dd3f4df4be0a585365ad7c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #95893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95893/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99744/testReport)** for PR 21465 at commit [`30a743d`](https://github.com/apache/spark/commit/30a743d79ecd2b18b5d1fa997d63dee914b714a0).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99408/testReport)** for PR 21465 at commit [`c0fcbb3`](https://github.com/apache/spark/commit/c0fcbb397ae3c954092e461bbabf2b8f8cf85386).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #91798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91798/testReport)** for PR 21465 at commit [`4290b58`](https://github.com/apache/spark/commit/4290b58b53f7f7b54715dec311b8ac772de9afd7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r235488017
--- Diff: python/pyspark/ml/classification.py ---
@@ -1176,8 +1176,8 @@ def trees(self):
@inherit_doc
class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
- GBTParams, HasCheckpointInterval, HasStepSize, HasSeed, JavaMLWritable,
- JavaMLReadable):
+ GBTParams, HasCheckpointInterval, HasStepSize, HasSeed,
+ HasValidationIndicatorCol, JavaMLWritable, JavaMLReadable):
--- End diff --
@BryanCutler Thank you very much for reviewing my PR. I moved HasValidationIndicatorCol, HasMaxIter and HasStepSize to GBTParams.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #98751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98751/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3712/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99013/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239173098
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
--- End diff --
@BryanCutler Thanks for your review.
Seems recently https://github.com/apache/spark/pull/22986 added ```trait HasVarianceImpurity``` and made
```private[ml] trait GBTClassifierParams extends GBTParams with HasVarianceImpurity```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238809338
--- Diff: python/pyspark/ml/classification.py ---
@@ -1242,40 +1255,36 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol
[0.25..., 0.23..., 0.21..., 0.19..., 0.18...]
>>> model.numClasses
2
+ >>> gbt = gbt.setValidationIndicatorCol("validationIndicator")
+ >>> gbt.getValidationIndicatorCol()
+ 'validationIndicator'
+ >>> gbt.getValidationTol()
+ 0.01
.. versionadded:: 1.4.0
"""
- lossType = Param(Params._dummy(), "lossType",
- "Loss function which GBT tries to minimize (case-insensitive). " +
- "Supported options: " + ", ".join(GBTParams.supportedLossTypes),
- typeConverter=TypeConverters.toString)
-
- stepSize = Param(Params._dummy(), "stepSize",
- "Step size (a.k.a. learning rate) in interval (0, 1] for shrinking " +
- "the contribution of each estimator.",
- typeConverter=TypeConverters.toFloat)
-
@keyword_only
def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction",
maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0,
maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, lossType="logistic",
- maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0,
- featureSubsetStrategy="all"):
+ maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0, impurity="variance",
--- End diff --
this is not the correct default impurity
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #91317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91317/testReport)** for PR 21465 at commit [`79fc83b`](https://github.com/apache/spark/commit/79fc83bae376c430a23fd7c9f502690a1c4d321e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239243661
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))]
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
+ """
+ Private class to track supported GBTClassifier params.
+
+ .. versionadded:: 3.0.0
+ """
+
+ supportedLossTypes = ["logistic"]
+
+ lossType = Param(Params._dummy(), "lossType",
+ "Loss function which GBT tries to minimize (case-insensitive). " +
+ "Supported options: " + ", ".join(supportedLossTypes),
+ typeConverter=TypeConverters.toString)
+
+ @since("3.0.0")
--- End diff --
please address the above comment
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99744/testReport)** for PR 21465 at commit [`30a743d`](https://github.com/apache/spark/commit/30a743d79ecd2b18b5d1fa997d63dee914b714a0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91317/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239569077
--- Diff: python/pyspark/ml/param/shared.py ---
@@ -814,3 +814,25 @@ def getDistanceMeasure(self):
"""
return self.getOrDefault(self.distanceMeasure)
+
+class HasValidationIndicatorCol(Params):
--- End diff --
Would you mind running the codegen again, like this command for example
`pushd python/pyspark/ml/param/ && python _shared_params_code_gen.py > shared.py && popd` and push the result if there is a diff? I think the DecisionTreeParams should be at the bottom of the file..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21465
merged to master, thanks @huaxingao !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:
https://github.com/apache/spark/pull/21465
@BryanCutler Thank you very much for your review! I will submit changes soon.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21465
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21465
**[Test build #99408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99408/testReport)** for PR 21465 at commit [`c0fcbb3`](https://github.com/apache/spark/commit/c0fcbb397ae3c954092e461bbabf2b8f8cf85386).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class GBTClassifierParams(GBTParams, HasVarianceImpurity):`
* `class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,`
* `class HasVarianceImpurity(Params):`
* `class TreeRegressorParams(HasVarianceImpurity):`
* `class GBTRegressorParams(GBTParams, TreeRegressorParams):`
* `class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org