You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BenFradet <gi...@git.apache.org> on 2015/12/17 10:10:16 UTC
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
GitHub user BenFradet opened a pull request:
https://github.com/apache/spark/pull/10355
[SPARK-7425] [ML] [WIP] spark.ml Predictor should support other numeric types for label
Currently, the Predictor abstraction expects the input labelCol type to be DoubleType, but we should support other numeric types. This will involve updating the PredictorParams.validateAndTransformSchema method.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BenFradet/spark SPARK-7425
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10355.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10355
----
commit 2b276a29cc6183e6b42f828631c23e74bcd4144f
Author: BenFradet <be...@gmail.com>
Date: 2015-12-13T15:41:57Z
check label data type for numeric type instead of double
commit 7ef4ad4fce33f48e744fd0c9dbf30549eee76097
Author: BenFradet <be...@gmail.com>
Date: 2015-12-13T15:47:42Z
added some cases to extractLabeledPoints, looking for a better way to handle this
commit e791ff6390284d6b0baae869de52f37bc0e38862
Author: BenFradet <be...@gmail.com>
Date: 2015-12-13T16:40:30Z
Added a method to set the metadata on a dataframe
commit 83ffecba566f6a085fffa7dcf3194fa9f64edfc3
Author: BenFradet <be...@gmail.com>
Date: 2015-12-13T16:41:43Z
unit tests for the decision tree classifier
commit 97dde27306984e3d78ca05d1cc99e47b18e48a8f
Author: BenFradet <be...@gmail.com>
Date: 2015-12-17T08:33:56Z
used the sqlcontext provided with MLlibTestSparkContext
commit c68ace1d7afe815ed252bacd0cebc576ef6e06b0
Author: BenFradet <be...@gmail.com>
Date: 2015-12-17T09:06:46Z
simpler version of extractLabeledPoints
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203641371
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203522113
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-198057662
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-200564242
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r56200539
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
@@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
assert(importances.toArray.forall(_ >= 0.0))
}
+ test("should support all NumericType labels") {
--- End diff --
I was thinking more like create a util method (or use an existing one if there is one) to generate the required test datasets. It's not a big deal though, just a "nice to have"
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197581060
**[Test build #53366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53366/consoleFull)** for PR 10355 at commit [`2690512`](https://github.com/apache/spark/commit/269051247d92ae2a58fb2749a3ace062ad28f9cb).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173322359
**[Test build #49796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49796/consoleFull)** for PR 10355 at commit [`2c55bb9`](https://github.com/apache/spark/commit/2c55bb9a0af4994113784a203836ae62035cd9c6).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203606312
**[Test build #54547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54547/consoleFull)** for PR 10355 at commit [`788eb58`](https://github.com/apache/spark/commit/788eb585a69c0997428f5da1d70630fae475ff28).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-202579805
I'll take a look
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-199709409
Yup, I'll rebase tonight and start working on the related jiras.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173343330
**[Test build #49796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49796/consoleFull)** for PR 10355 at commit [`2c55bb9`](https://github.com/apache/spark/commit/2c55bb9a0af4994113784a203836ae62035cd9c6).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201018203
**[Test build #54085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54085/consoleFull)** for PR 10355 at commit [`18b3308`](https://github.com/apache/spark/commit/18b3308d4042c4eb64dedf77f59b659fe2a041e3).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165396129
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47918/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165909357
**[Test build #48023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48023/consoleFull)** for PR 10355 at commit [`1003291`](https://github.com/apache/spark/commit/10032913af7d8614ecc2c8a132f30f3e13297c3a).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165552508
**[Test build #47940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47940/consoleFull)** for PR 10355 at commit [`4ada320`](https://github.com/apache/spark/commit/4ada32096ec5839d2ef822cba5d261732e0aaced).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165913202
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173343509
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173315234
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203522117
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54535/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165390845
**[Test build #47917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47917/consoleFull)** for PR 10355 at commit [`c68ace1`](https://github.com/apache/spark/commit/c68ace1d7afe815ed252bacd0cebc576ef6e06b0).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165733530
**[Test build #47996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47996/consoleFull)** for PR 10355 at commit [`5fd5069`](https://github.com/apache/spark/commit/5fd5069f784e067e3245e0db6b7d2de48aae84fe).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195455900
**[Test build #52928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52928/consoleFull)** for PR 10355 at commit [`2575fb7`](https://github.com/apache/spark/commit/2575fb724c0b6f00ccd0ea5f6ce069ee1076bad8).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195485937
**[Test build #52932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52932/consoleFull)** for PR 10355 at commit [`45582de`](https://github.com/apache/spark/commit/45582deb51a23893e61bc0155fb0c20fb5ac4e6f).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201037265
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173307035
**[Test build #49793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49793/consoleFull)** for PR 10355 at commit [`3c7b6c4`](https://github.com/apache/spark/commit/3c7b6c4b3cb80106a3e609f2788789682cf88150).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/10355
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-204478499
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54707/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197623814
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53366/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165848903
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165748682
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47997/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57643353
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala ---
@@ -61,6 +61,20 @@ private[spark] object SchemaUtils {
}
/**
+ * Check whether the given schema contains a column of the numeric data type.
--- End diff --
indent 1 space less (can update IntelliJ code style settings: Editor -> Code Style -> Scala -> ScalaDoc tab -> uncheck Use scaladoc indent for leading asterisk
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57643358
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -17,14 +17,116 @@
package org.apache.spark.ml.util
-import org.apache.spark.ml.Model
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor}
import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.regression.Regressor
+import org.apache.spark.ml.tree.impl.TreeTests
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.sql.{DataFrame, SQLContext}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
-object MLTestingUtils {
+object MLTestingUtils extends SparkFunSuite {
def checkCopy(model: Model[_]): Unit = {
val copied = model.copy(ParamMap.empty)
.asInstanceOf[Model[_]]
assert(copied.parent.uid == model.parent.uid)
assert(copied.parent == model.parent)
}
+
+ def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]](
+ predictor: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = {
+ val dfs = if (predictor.isInstanceOf[Regressor[_, _, _]]) {
+ genRegressionDFWithNumericLabelCol(sqlContext)
+ } else {
+ genClassifDFWithNumericLabelCol(sqlContext)
+ }
+ val expected = predictor.fit(dfs(DoubleType))
+ val actuals = dfs.keys.filter(_ != DoubleType).map(t => predictor.fit(dfs(t)))
+ actuals.foreach(actual => check(expected, actual))
+ }
+
+ def checkPredictorRejectNotNumericTypes(
+ predictor: Predictor[_, _, _], sqlContext: SQLContext): Unit = {
+ val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext)
+ val thrown = intercept[IllegalArgumentException] {
+ predictor.fit(dfWithStringLabels)
+ }
+ assert(thrown.getMessage contains
+ "Column label must be of type NumericType but was actually of type StringType")
+ }
+
+ def checkEstimatorAcceptAllNumericTypes[M <: Model[M], T <: Estimator[M]](
+ estimator: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = {
+ val dfs = genRegressionDFWithNumericLabelCol(sqlContext)
+ val expected = estimator.fit(dfs(DoubleType))
+ val actuals = dfs.keys.filter(_ != DoubleType).map(t => estimator.fit(dfs(t)))
+ actuals.foreach(actual => check(expected, actual))
+ }
+
+ def checkEstimatorRejectNotNumericTypes(
+ predictor: Estimator[_], sqlContext: SQLContext): Unit = {
+ val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext)
+ val thrown = intercept[IllegalArgumentException] {
+ predictor.fit(dfWithStringLabels)
+ }
+ assert(thrown.getMessage contains
+ "Column label must be of type NumericType but was actually of type StringType")
+ }
+
+ def genClassifDFWithNumericLabelCol(
+ sqlContext: SQLContext,
+ labelColName: String = "label",
+ featuresColName: String = "features"): Map[NumericType, DataFrame] = {
+ val df = sqlContext.createDataFrame(Seq(
+ (0, Vectors.dense(0, 2, 3)),
+ (1, Vectors.dense(0, 3, 1)),
+ (0, Vectors.dense(0, 2, 2)),
+ (1, Vectors.dense(0, 3, 9)),
+ (0, Vectors.dense(0, 2, 6))
+ )).toDF(labelColName, featuresColName)
+
+ val types =
+ Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0))
+ types.map(t => t -> df.select(col(labelColName).cast(t), col(featuresColName)))
+ .map { case (t, d) => t -> TreeTests.setMetadata(d, 2, labelColName) }
+ .toMap
+ }
+
+ def genRegressionDFWithNumericLabelCol(
+ sqlContext: SQLContext,
+ labelColName: String = "label",
+ featuresColName: String = "features",
+ censorColName: String = "censor"): Map[NumericType, DataFrame] = {
+ val df = sqlContext.createDataFrame(Seq(
+ (0, Vectors.dense(0)),
+ (1, Vectors.dense(1)),
+ (2, Vectors.dense(2)),
+ (3, Vectors.dense(3)),
+ (4, Vectors.dense(4))
+ )).toDF(labelColName, featuresColName)
+
+ val types =
+ Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0))
+ types
+ .map(t => t -> df.select(col(labelColName).cast(t), col(featuresColName)))
+ .map { case (t, d) =>
+ t -> TreeTests.setMetadata(d, 2, labelColName).withColumn(censorColName, lit(0.0))
--- End diff --
setMetadata: classes should be 0, not 2
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-192681503
**[Test build #52519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52519/consoleFull)** for PR 10355 at commit [`dd18ddc`](https://github.com/apache/spark/commit/dd18ddc5ece97225c3b76716b49593ac7fa0be36).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165552039
**[Test build #47940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47940/consoleFull)** for PR 10355 at commit [`4ada320`](https://github.com/apache/spark/commit/4ada32096ec5839d2ef822cba5d261732e0aaced).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165429666
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203522095
**[Test build #54535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54535/consoleFull)** for PR 10355 at commit [`21e4afb`](https://github.com/apache/spark/commit/21e4afb9d734f81e47c73b16e1f29f76c46df6f5).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201037149
**[Test build #54088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54088/consoleFull)** for PR 10355 at commit [`42d5e0f`](https://github.com/apache/spark/commit/42d5e0f28ac09c15cddb2af11c01df4b04503d4e).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203617385
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54547/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195500352
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52932/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-204623657
LGTM
I'll go ahead and merge this with master before a conflict happens
Thanks for the PR!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173375255
Pinging @jkbradley and @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r56203460
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
@@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
assert(importances.toArray.forall(_ >= 0.0))
}
+ test("should support all NumericType labels") {
--- End diff --
OK, will do.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-198113198
**[Test build #53454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53454/consoleFull)** for PR 10355 at commit [`6b8b0cf`](https://github.com/apache/spark/commit/6b8b0cfa44e0c3050c17a4bcb7be9faeaae53748).
* This patch passes all tests.
* This patch **does not merge cleanly**.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57245082
--- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala ---
@@ -183,12 +183,12 @@ class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override val uid: S
* Extract [[featuresCol]], [[labelCol]] and [[censorCol]] from input dataset,
* and put it in an RDD with strong types.
*/
- protected[ml] def extractAFTPoints(dataset: DataFrame): RDD[AFTPoint] = {
- dataset.select($(featuresCol), $(labelCol), $(censorCol)).rdd.map {
- case Row(features: Vector, label: Double, censor: Double) =>
- AFTPoint(features, label, censor)
- }
- }
+ protected[ml] def extractAFTPoints(dataset: DataFrame): RDD[AFTPoint] =
--- End diff --
I prefer to keep braces around multiline methods like this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197623583
**[Test build #53366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53366/consoleFull)** for PR 10355 at commit [`2690512`](https://github.com/apache/spark/commit/269051247d92ae2a58fb2749a3ace062ad28f9cb).
* This patch **fails Spark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-194861966
Great! Thanks.
I'll fix those conflicts later today.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203102318
**[Test build #54458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54458/consoleFull)** for PR 10355 at commit [`cf005af`](https://github.com/apache/spark/commit/cf005af19eb52037bc32905c5d30595681f68152).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57057621
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -27,4 +31,56 @@ object MLTestingUtils {
assert(copied.parent.uid == model.parent.uid)
assert(copied.parent == model.parent)
}
+
+ def genClassifDFWithNumericLabelCol(
+ sqlContext: SQLContext,
+ labelColName: String,
+ featuresColName: String
+ ): Map[NumericType, DataFrame] = {
--- End diff --
I'll modify the whole file then.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165848904
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48009/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-193294590
@mengxr @jkbradley is there any interest in this pr or should i close it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47933370
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
@@ -275,6 +274,40 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
val model = dt.fit(df)
}
+ test("DecisionTree should support all NumericType labels") {
+ val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
--- End diff --
I think so too, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165389989
This is still a wip but remarks are welcome.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-198057663
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53458/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165909442
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48023/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57045359
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
@@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
assert(importances.toArray.forall(_ >= 0.0))
}
+ test("should support all NumericType labels") {
+ val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features")
+
+ val rf = new RandomForestClassifier().setFeaturesCol("features")
+
+ val expected = rf.setLabelCol("label")
+ .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label"))
+ dfs.keys.filter(_ != DoubleType).foreach { t =>
+ TreeTests.checkEqual(expected,
+ rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label")))
+ }
+ }
+
+ test("shouldn't support non NumericType labels") {
--- End diff --
I still think the code duplication is a big concern. It might not be that hard to add some utility function that can clean this up. For example I was able to get this working:
In _MLTestingUtils.scala_
```scala
def checkRejectsNonNumericType(est: Predictor[_, _, _], sqlContext: SQLContext) = {
val dfWithStringLabels =
MLTestingUtils.generateDFWithStringLabelCol(sqlContext, est.getLabelCol, est.getFeaturesCol)
val thrown = intercept[IllegalArgumentException] {
est.fit(dfWithStringLabels)
}
assert(thrown.getMessage contains
"Column label must be of type NumericType but was actually of type StringType")
}
```
Then in the actual test suites:
```scala
test("shouldn't support non NumericType labels") {
MLTestingUtils.checkRejectsNonNumericType(new RandomForestClassifier, sqlContext)
}
```
There will be some complication for some of the estimators like OneVsRest and IsotonicRegression (why doesn't it extend Predictor?) but I think it will be worth it to get this right since future estimators will have to implement this as well. It would be really nice to have something like we do for params where there is just a one-liner `ParamsSuite.checkParams(new RandomForestClassifier)`. For testing that all numeric types are supported, we could have a utility method that produces all actual and expected results, then check for equality inside the individual test suites, like:
```scala
val models = MLTestingUtils.fitAllNumericTypes(new NaiveBayes, sqlContext)
models.foreach { case (expected, actual) =>
assert(expected.pi === actual.pi)
assert(expected.theta === actual.theta)
}
```
I'm open to feedback on this, realizing it could be hard to generalize this for every case.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173375292
Pinging @jkbradley and @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195557577
@MLnick you were right, thanks for spotting that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197596345
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-198035588
**[Test build #53454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53454/consoleFull)** for PR 10355 at commit [`6b8b0cf`](https://github.com/apache/spark/commit/6b8b0cfa44e0c3050c17a4bcb7be9faeaae53748).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165426246
**[Test build #47919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47919/consoleFull)** for PR 10355 at commit [`b014357`](https://github.com/apache/spark/commit/b0143575b50567225a63dabfef560a5d35554077).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-202603261
This looks to be in very good shape. My main comment is about unit test time: the tests take a significant amount of time. Wherever it is easy to do, could you set the algorithm parameters to make for faster tests? E.g., use 1 iteration for LogisticRegression, use maxDepth 0 for trees, etc.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-204478494
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197755584
@MLnick yup thanks, I'll get to those once I'm done with this one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195473833
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173311106
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49793/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-199708780
Looks fine to me - latest test result indicates there may be merge conflicts?
I'd like @jkbradley to take a quick pass too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201750481
@jkbradley @sethah @MLnick I think it's in a pretty good state now, what do you guys think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165396122
**[Test build #47918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47918/consoleFull)** for PR 10355 at commit [`1027e83`](https://github.com/apache/spark/commit/1027e83d9add7a91adda7bba892b3fe7926405f1).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165909440
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201582826
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54229/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165401216
**[Test build #47919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47919/consoleFull)** for PR 10355 at commit [`b014357`](https://github.com/apache/spark/commit/b0143575b50567225a63dabfef560a5d35554077).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165552519
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47940/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165748680
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165426698
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r48050215
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -160,6 +160,59 @@ class OneVsRestSuite extends SparkFunSuite with MLlibTestSparkContext {
require(m.getThreshold === 0.1, "copy should handle extra model params")
}
}
+
+ test("should support all NumericType labels") {
+ val df = sqlContext.createDataFrame(Seq(
+ (0, Vectors.dense(0, 2, 3)),
+ (1, Vectors.dense(0, 3, 1)),
+ (0, Vectors.dense(0, 2, 2)),
+ (1, Vectors.dense(0, 3, 9)),
+ (2, Vectors.dense(0, 2, 6))
+ )).toDF("label", "features")
+
+ val types =
+ Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0))
+
+ var dfWithTypes = df
+ types.foreach(t => dfWithTypes = dfWithTypes.withColumn(t.toString, df("label").cast(t)))
+
+ val ovr = new OneVsRest()
+ .setClassifier(new LogisticRegression)
+ .setFeaturesCol("features")
+
+ val expected = ovr.setLabelCol(DoubleType.toString).fit(dfWithTypes)
+ val expectedModels = expected.models.map(m => m.asInstanceOf[LogisticRegressionModel])
+ types.filter(_ != DoubleType).foreach { t =>
+ val actual = ovr.setLabelCol(t.toString).fit(dfWithTypes)
+ val actualModels = actual.models.map(m => m.asInstanceOf[LogisticRegressionModel])
+ assert(expectedModels.length === actualModels.length)
+ expectedModels.zip(actualModels).foreach { case (e, a) =>
+ assert(e.intercept === a.intercept)
+ assert(e.coefficients.toArray === a.coefficients.toArray)
+ }
+ }
+ }
+
+ test("shouldn't support non NumericType labels") {
+ val dfWithStringLabels = sqlContext.createDataFrame(Seq(
+ ("0", Vectors.dense(0, 2, 3)),
+ ("1", Vectors.dense(0, 3, 1)),
+ ("0", Vectors.dense(0, 2, 2)),
+ ("1", Vectors.dense(0, 3, 9)),
+ ("2", Vectors.dense(0, 2, 6))
+ )).toDF("label", "features")
+
+ val ovr = new OneVsRest()
+ .setClassifier(new LogisticRegression)
+ .setLabelCol("label")
+ .setFeaturesCol("features")
+
+ // thrown by AttributeFactory#fromStructField and not by Predictor#validateAndTransformSchema
+ // because OneVsRest reimplements the fit method
+ intercept[IllegalArgumentException] {
--- End diff --
This is caused by [AttributeFactory#fromStructField](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/attribute/attributes.scala#L130).
I don't know what's the best way to handle this, input welcome.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195500349
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173374838
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49805/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165896927
**[Test build #48024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48024/consoleFull)** for PR 10355 at commit [`46ca0a7`](https://github.com/apache/spark/commit/46ca0a7afa29bc39b0ee408f72f8c72fa1f91866).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r56199972
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
@@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
assert(importances.toArray.forall(_ >= 0.0))
}
+ test("should support all NumericType labels") {
--- End diff --
The code duplication is mostly related to the test datasets.
What I can do is reduce their size to one or two observations, what do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197623813
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173314806
**[Test build #49794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49794/consoleFull)** for PR 10355 at commit [`2c3b4cf`](https://github.com/apache/spark/commit/2c3b4cf22e4b7bcf109669cb7b56f39a82d8cf26).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-199967566
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r56136601
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
@@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
assert(importances.toArray.forall(_ >= 0.0))
}
+ test("should support all NumericType labels") {
--- End diff --
It's not a major issue, but there is a large amount of code duplication in all the tests. I know you've already tried to pare it down, but if it's possible to further reduce it by extracting out some methods, that would be cleaner.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57796113
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -17,14 +17,116 @@
package org.apache.spark.ml.util
-import org.apache.spark.ml.Model
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor}
import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.regression.Regressor
+import org.apache.spark.ml.tree.impl.TreeTests
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.sql.{DataFrame, SQLContext}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
-object MLTestingUtils {
+object MLTestingUtils extends SparkFunSuite {
def checkCopy(model: Model[_]): Unit = {
val copied = model.copy(ParamMap.empty)
.asInstanceOf[Model[_]]
assert(copied.parent.uid == model.parent.uid)
assert(copied.parent == model.parent)
}
+
+ def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]](
--- End diff --
good point, will update
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197596347
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53368/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173315239
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49794/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165395169
**[Test build #47918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47918/consoleFull)** for PR 10355 at commit [`1027e83`](https://github.com/apache/spark/commit/1027e83d9add7a91adda7bba892b3fe7926405f1).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165893915
**[Test build #48023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48023/consoleFull)** for PR 10355 at commit [`1003291`](https://github.com/apache/spark/commit/10032913af7d8614ecc2c8a132f30f3e13297c3a).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195473729
**[Test build #52929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52929/consoleFull)** for PR 10355 at commit [`2230292`](https://github.com/apache/spark/commit/2230292fb7c31c6487b5c5cbfd5a5b4554db15f3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r56468850
--- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala ---
@@ -347,6 +348,34 @@ class AFTSurvivalRegressionSuite
}
}
+ test("should support all NumericType labels") {
+ val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features")
--- End diff --
Thanks for the insight, I'll fix this later today.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-192701255
**[Test build #52519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52519/consoleFull)** for PR 10355 at commit [`dd18ddc`](https://github.com/apache/spark/commit/dd18ddc5ece97225c3b76716b49593ac7fa0be36).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197744650
@BenFradet I linked 2 additional sub-tasks to the [umbrella JIRA](https://issues.apache.org/jira/browse/SPARK-11107)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-167002913
Also, what about the other things expecting a `labelCol` of `DoubleType` like the different evaluators?
Should this be handled in a follow-up jira/pr?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201582825
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-192701356
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-204478315
**[Test build #54707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54707/consoleFull)** for PR 10355 at commit [`718774b`](https://github.com/apache/spark/commit/718774b7401b86a99c2c8ae02014470b0d3a77a3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-202744058
Sure, thanks for your review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165748572
**[Test build #47997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47997/consoleFull)** for PR 10355 at commit [`0c6feab`](https://github.com/apache/spark/commit/0c6feab71fecafd44953232c0f99534cc893e9e0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173315228
**[Test build #49794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49794/consoleFull)** for PR 10355 at commit [`2c3b4cf`](https://github.com/apache/spark/commit/2c3b4cf22e4b7bcf109669cb7b56f39a82d8cf26).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57241509
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
@@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
assert(importances.toArray.forall(_ >= 0.0))
}
+ test("should support all NumericType labels") {
+ val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features")
+
+ val rf = new RandomForestClassifier().setFeaturesCol("features")
+
+ val expected = rf.setLabelCol("label")
+ .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label"))
+ dfs.keys.filter(_ != DoubleType).foreach { t =>
+ TreeTests.checkEqual(expected,
+ rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label")))
+ }
+ }
+
+ test("shouldn't support non NumericType labels") {
--- End diff --
Here's what I managed to make work:
```scala
def checkNumericTypes[M <: RegressionModel[_, M], T <: Regressor[_, _, M]](
regressor: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = {
val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features")
val expected = regressor.fit(dfs(DoubleType))
val actuals = dfs.keys.filter(_ != DoubleType).map(t => regressor.fit(dfs(t)))
actuals.foreach(actual => check(expected, actual))
}
```
which could be used like:
```scala
MLTestingUtils.checkNumericTypes[LinearRegressionModel, LinearRegression](
new LinearRegression(), sqlContext) { (expected, actual) =>
assert(expected.intercept === actual.intercept)
assert(expected.coefficients === actual.coefficients)
}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201018208
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201560326
**[Test build #54229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54229/consoleFull)** for PR 10355 at commit [`f8b7823`](https://github.com/apache/spark/commit/f8b7823e246c1ea9ee7729971fdc1d3a64ac8208).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203617375
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-194856518
@BenFradet I'll review - there seem to be merge conflicts that need resolving.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203103287
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165819097
I noticed [AFTSurvivalRegression](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L99) uses its own `validateAndTransformSchema` method.
Should I make it support other numeric types in this PR or in another one?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165429356
**[Test build #47921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47921/consoleFull)** for PR 10355 at commit [`47cc81b`](https://github.com/apache/spark/commit/47cc81bdd84146587ba169c80c8fc1ef382dafe3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165405712
**[Test build #47921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47921/consoleFull)** for PR 10355 at commit [`47cc81b`](https://github.com/apache/spark/commit/47cc81bdd84146587ba169c80c8fc1ef382dafe3).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r56465970
--- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala ---
@@ -347,6 +348,34 @@ class AFTSurvivalRegressionSuite
}
}
+ test("should support all NumericType labels") {
+ val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features")
--- End diff --
I think you'll need to adjust the generated data here as it requires an additional 'censor' col.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47933490
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
@@ -275,6 +274,40 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
val model = dt.fit(df)
}
+ test("DecisionTree should support all NumericType labels") {
+ val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
+ (0, Vectors.dense(0, 2, 3)),
+ (1, Vectors.dense(0, 3, 1)),
+ (0, Vectors.dense(0, 2, 2)),
+ (1, Vectors.dense(0, 3, 9)),
+ (0, Vectors.dense(0, 2, 6))
+ )).toDF("label", "features"), 2)
+
+ val dfWithFloatLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
+ (0f, Vectors.dense(0, 2, 3)),
+ (1f, Vectors.dense(0, 3, 1)),
+ (0f, Vectors.dense(0, 2, 2)),
+ (1f, Vectors.dense(0, 3, 9)),
+ (0f, Vectors.dense(0, 2, 6))
+ )).toDF("label", "features"), 2)
+
+ val dfWithLongLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
+ (0L, Vectors.dense(0, 2, 3)),
+ (1L, Vectors.dense(0, 3, 1)),
+ (0L, Vectors.dense(0, 2, 2)),
+ (1L, Vectors.dense(0, 3, 9)),
+ (0L, Vectors.dense(0, 2, 6))
+ )).toDF("label", "features"), 2)
+
+ val dt = new DecisionTreeClassifier()
+ .setLabelCol("label")
+ .setFeaturesCol("features")
+
+ dt.fit(dfWithIntLabels)
+ dt.fit(dfWithFloatLabels)
+ dt.fit(dfWithLongLabels)
+ }
--- End diff --
I'm not sure it's not clear what this test is checking for. It looks like it just checks that no errors are thrown during training the tree. Maybe we should train a tree with a data frame with `DoubleType` as the column and check equality between training trees with other label column types?
```scala
val doubleModel = dt.fit(dfWithDoubleLabels)
val intModel = dt.fit(dfWithIntLabels)
TreeTests.checkEqual(doubleModel, intModel)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-200556843
Sorry for the long delay! I'll take a look now
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r55686938
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
@@ -151,7 +151,7 @@ abstract class NumericType extends AtomicType {
}
-private[sql] object NumericType extends AbstractDataType {
+private[spark] object NumericType extends AbstractDataType {
--- End diff --
Because it's now used in [Predictor.scala]{https://github.com/apache/spark/pull/10355/files#diff-5fe763c1d5d2d36c0154204028cd3585R56}
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173374837
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195463826
**[Test build #52929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52929/consoleFull)** for PR 10355 at commit [`2230292`](https://github.com/apache/spark/commit/2230292fb7c31c6487b5c5cbfd5a5b4554db15f3).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-196743025
@BenFradet I'm ok with doing the other things like evaluators in a different PR - however the only concern I'd have is it would temporarily break things like `CrossValidator` that use evaluators and a non-Double `labelCol` for the model. It's ok if you're able to do the follow up soon. Otherwise it may be worth thinking about just doing them here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195499950
**[Test build #52932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52932/consoleFull)** for PR 10355 at commit [`45582de`](https://github.com/apache/spark/commit/45582deb51a23893e61bc0155fb0c20fb5ac4e6f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173374354
**[Test build #49805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49805/consoleFull)** for PR 10355 at commit [`2c55bb9`](https://github.com/apache/spark/commit/2c55bb9a0af4994113784a203836ae62035cd9c6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-200726627
@jkbradley yes, I'm currently trying to incorporate @sethah's comments and keep code duplication to a strict minimum.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197035874
**[Test build #53227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53227/consoleFull)** for PR 10355 at commit [`7d9835a`](https://github.com/apache/spark/commit/7d9835abd0f3bfac2ef66db01a4de4fdc7dceb48).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173343511
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49796/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201037272
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54088/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197039251
**[Test build #53227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53227/consoleFull)** for PR 10355 at commit [`7d9835a`](https://github.com/apache/spark/commit/7d9835abd0f3bfac2ef66db01a4de4fdc7dceb48).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203641203
**[Test build #54549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54549/consoleFull)** for PR 10355 at commit [`58290af`](https://github.com/apache/spark/commit/58290af3c888f8d6a600ca4676488d4b542e04e7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203641373
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54549/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165720858
**[Test build #47996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47996/consoleFull)** for PR 10355 at commit [`5fd5069`](https://github.com/apache/spark/commit/5fd5069f784e067e3245e0db6b7d2de48aae84fe).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165517939
I assume since this is a WIP you are still going to add test cases for the other predictors? Additionally, since ShortType and DecimalType also extend NumericType, I think the tests should include those cases.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165396126
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57768696
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -17,14 +17,116 @@
package org.apache.spark.ml.util
-import org.apache.spark.ml.Model
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor}
import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.regression.Regressor
+import org.apache.spark.ml.tree.impl.TreeTests
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.sql.{DataFrame, SQLContext}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
-object MLTestingUtils {
+object MLTestingUtils extends SparkFunSuite {
def checkCopy(model: Model[_]): Unit = {
val copied = model.copy(ParamMap.empty)
.asInstanceOf[Model[_]]
assert(copied.parent.uid == model.parent.uid)
assert(copied.parent == model.parent)
}
+
+ def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]](
--- End diff --
Is there a reason to have two separate methods for predictor and estimator? I also tend to prefer _not_ separating the "accept" and "reject" checks, and instead have them in a single function call. This will reduce the complexity of having to decide which one to use (and why) and also reduce the number of function calls in future tests. I was able to get the tests working having a single method for accept/reject and estimator/predictor, like this:
```scala
def checkEstimatorNumericTypes[M <: Model[M], T <: Estimator[M]](
predictor: T,
isClassification: Boolean,
sqlContext: SQLContext)(check: (M, M) => Unit): Unit = {
val dfs = if (isClassification) {
genClassifDFWithNumericLabelCol(sqlContext)
} else {
genRegressionDFWithNumericLabelCol(sqlContext)
}
val expected = predictor.fit(dfs(DoubleType))
val actuals = dfs.keys.filter(_ != DoubleType).map(t => predictor.fit(dfs(t)))
actuals.foreach(actual => check(expected, actual))
val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext)
val thrown = intercept[IllegalArgumentException] {
predictor.fit(dfWithStringLabels)
}
assert(thrown.getMessage contains
"Column label must be of type NumericType but was actually of type StringType")
}
```
I had to change the fit method of OneVsRest to include a call to `transformSchema(dataset.schema)` so that it would fail properly on non-numeric label column. I think it's better to change the fit method so that all ML estimators provide consistent errors. Also note that the `isClassification` flag is needed because MLPClassifier does not inherit from Classifier, and so it is not easy to infer that it is a classification estimator.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57219723
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
@@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
assert(importances.toArray.forall(_ >= 0.0))
}
+ test("should support all NumericType labels") {
+ val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features")
+
+ val rf = new RandomForestClassifier().setFeaturesCol("features")
+
+ val expected = rf.setLabelCol("label")
+ .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label"))
+ dfs.keys.filter(_ != DoubleType).foreach { t =>
+ TreeTests.checkEqual(expected,
+ rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label")))
+ }
+ }
+
+ test("shouldn't support non NumericType labels") {
--- End diff --
I wanted to be able to do something like:
```scala
def checkNumericTypes[T <: Regressor, M <: Model](
regressor: T, sqlContext: SQLContext)(asserts: Seq[(M, M) => Unit]): Unit = {
val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features")
val expected = regressor.fit(dfs(DoubleType))
dfs.keys.filter(_ != DoubleType).foreach { t =>
val actual = regressor.fit(dfs(t))
asserts.foreach(f => f(expected, actual))
}
}
```
But unfortunately that doesn't work since I'm not able to capture which model is being created by the regressor's `fit` method.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165519097
@sethah Yup, I plan to make it exhaustive but it'll take a bit of time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r58168089
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row}
import org.apache.spark.sql.functions._
-import org.apache.spark.sql.types.{DataType, DoubleType, StructType}
+import org.apache.spark.sql.types.{DataType, DoubleType, NumericType, StructType}
--- End diff --
minor, but I believe the `NumericType` import is no longer necessary
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-204275787
One very minor comment on imports, and think there are now merge conflicts with latest master that need fixing, subject to those LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-199709002
@sethah could you also take a final pass?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-204279426
Will fix that, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-200564217
**[Test build #53974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53974/consoleFull)** for PR 10355 at commit [`a88c4a3`](https://github.com/apache/spark/commit/a88c4a33f361c0967fa7e4f0ebf086f62a6460cb).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195455908
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201017850
**[Test build #54085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54085/consoleFull)** for PR 10355 at commit [`18b3308`](https://github.com/apache/spark/commit/18b3308d4042c4eb64dedf77f59b659fe2a041e3).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-199967218
**[Test build #53797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53797/consoleFull)** for PR 10355 at commit [`b2f4f51`](https://github.com/apache/spark/commit/b2f4f518e41e463d679dd7f8266b150674a2e138).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165726765
**[Test build #47997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47997/consoleFull)** for PR 10355 at commit [`0c6feab`](https://github.com/apache/spark/commit/0c6feab71fecafd44953232c0f99534cc893e9e0).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165429667
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47921/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165426703
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47919/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173311071
**[Test build #49793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49793/consoleFull)** for PR 10355 at commit [`3c7b6c4`](https://github.com/apache/spark/commit/3c7b6c4b3cb80106a3e609f2788789682cf88150).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47933061
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
@@ -275,6 +274,40 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
val model = dt.fit(df)
}
+ test("DecisionTree should support all NumericType labels") {
+ val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
--- End diff --
It might be less verbose to create the dataframe once, and then add the other label column types to the same data frame. Something like:
```scala
val dfWithTypes = df
.withColumn("shortLabel", df("labelIndex").cast(ShortType))
.withColumn("longLabel", df("labelIndex").cast(LongType))
.withColumn("intLabel", df("labelIndex").cast(IntegerType))
.withColumn("floatLabel", df("labelIndex").cast(FloatType))
.withColumn("decimalLabel", df("labelIndex").cast(DecimalType(10, 0)))
```
Then just change the label column between training. I'm not sure which way is better, but this would reduce copying the code ~5 times per test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-199967576
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53797/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203617194
**[Test build #54547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54547/consoleFull)** for PR 10355 at commit [`788eb58`](https://github.com/apache/spark/commit/788eb585a69c0997428f5da1d70630fae475ff28).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197039258
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195455108
**[Test build #52928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52928/consoleFull)** for PR 10355 at commit [`2575fb7`](https://github.com/apache/spark/commit/2575fb724c0b6f00ccd0ea5f6ce069ee1076bad8).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173344090
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173347952
**[Test build #49805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49805/consoleFull)** for PR 10355 at commit [`2c55bb9`](https://github.com/apache/spark/commit/2c55bb9a0af4994113784a203836ae62035cd9c6).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203627025
**[Test build #54549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54549/consoleFull)** for PR 10355 at commit [`58290af`](https://github.com/apache/spark/commit/58290af3c888f8d6a600ca4676488d4b542e04e7).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165913085
**[Test build #48024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48024/consoleFull)** for PR 10355 at commit [`46ca0a7`](https://github.com/apache/spark/commit/46ca0a7afa29bc39b0ee408f72f8c72fa1f91866).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165913300
cc @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201021316
**[Test build #54088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54088/consoleFull)** for PR 10355 at commit [`42d5e0f`](https://github.com/apache/spark/commit/42d5e0f28ac09c15cddb2af11c01df4b04503d4e).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-196742174
Surething
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165390849
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47917/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165848753
**[Test build #48009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48009/consoleFull)** for PR 10355 at commit [`67162db`](https://github.com/apache/spark/commit/67162db8c0d27a5ce02ed19190306362bd3de710).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197039263
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53227/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-198057482
**[Test build #53458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53458/consoleFull)** for PR 10355 at commit [`e8a56d5`](https://github.com/apache/spark/commit/e8a56d5927b4652ac0b89fb3783a495a239d0eae).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-198113445
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53454/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-200559906
**[Test build #53974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53974/consoleFull)** for PR 10355 at commit [`a88c4a3`](https://github.com/apache/spark/commit/a88c4a33f361c0967fa7e4f0ebf086f62a6460cb).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195455913
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52928/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165733583
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47996/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57057748
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
@@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
assert(importances.toArray.forall(_ >= 0.0))
}
+ test("should support all NumericType labels") {
+ val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features")
+
+ val rf = new RandomForestClassifier().setFeaturesCol("features")
+
+ val expected = rf.setLabelCol("label")
+ .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label"))
+ dfs.keys.filter(_ != DoubleType).foreach { t =>
+ TreeTests.checkEqual(expected,
+ rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label")))
+ }
+ }
+
+ test("shouldn't support non NumericType labels") {
--- End diff --
Yup, good idea, I'll investigate.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-196741326
@BenFradet if it's not too much trouble, I think you can do `AFTSurvivalRegression` and `IsotonicRegression` here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203521461
**[Test build #54535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54535/consoleFull)** for PR 10355 at commit [`21e4afb`](https://github.com/apache/spark/commit/21e4afb9d734f81e47c73b16e1f29f76c46df6f5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-200564243
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53974/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197537090
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-198113442
Build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203103276
**[Test build #54458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54458/consoleFull)** for PR 10355 at commit [`cf005af`](https://github.com/apache/spark/commit/cf005af19eb52037bc32905c5d30595681f68152).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-192701357
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52519/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165390564
**[Test build #47917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47917/consoleFull)** for PR 10355 at commit [`c68ace1`](https://github.com/apache/spark/commit/c68ace1d7afe815ed252bacd0cebc576ef6e06b0).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197037079
Here's what's left to do:
- IsotonicRegression
- GeneralizedLinearRegression
- AFTSurvival
Otherwise, I'd prefer doing the evaluators and other estimators (like rformula and chiSqSelector) in other prs
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-204676362
No problem, thanks for reviewing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-199948244
**[Test build #53797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53797/consoleFull)** for PR 10355 at commit [`b2f4f51`](https://github.com/apache/spark/commit/b2f4f518e41e463d679dd7f8266b150674a2e138).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197596308
**[Test build #53368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53368/consoleFull)** for PR 10355 at commit [`3b424dc`](https://github.com/apache/spark/commit/3b424dc88211c83615ef8f16402879cc9eb45c2e).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201582376
**[Test build #54229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54229/consoleFull)** for PR 10355 at commit [`f8b7823`](https://github.com/apache/spark/commit/f8b7823e246c1ea9ee7729971fdc1d3a64ac8208).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-200561614
@BenFradet Are you actively working on this? if so, I'll wait to review it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-201018209
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54085/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165552516
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165733582
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165894586
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48022/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197537016
**[Test build #53343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53343/consoleFull)** for PR 10355 at commit [`9feca44`](https://github.com/apache/spark/commit/9feca44e2796baaa9ccf5dfb03e2b9a0eabb731b).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47932753
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
@@ -275,6 +274,40 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
val model = dt.fit(df)
}
+ test("DecisionTree should support all NumericType labels") {
+ val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
+ (0, Vectors.dense(0, 2, 3)),
--- End diff --
It might be less verbose to create the dataframe once and then add different types of label columns. Something like:
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-204465968
**[Test build #54707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54707/consoleFull)** for PR 10355 at commit [`718774b`](https://github.com/apache/spark/commit/718774b7401b86a99c2c8ae02014470b0d3a77a3).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197537092
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53343/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r55680258
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
@@ -151,7 +151,7 @@ abstract class NumericType extends AtomicType {
}
-private[sql] object NumericType extends AbstractDataType {
+private[spark] object NumericType extends AbstractDataType {
--- End diff --
Why is this change required?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-195473838
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52929/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197521550
**[Test build #53343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53343/consoleFull)** for PR 10355 at commit [`9feca44`](https://github.com/apache/spark/commit/9feca44e2796baaa9ccf5dfb03e2b9a0eabb731b).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-197586274
**[Test build #53368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53368/consoleFull)** for PR 10355 at commit [`3b424dc`](https://github.com/apache/spark/commit/3b424dc88211c83615ef8f16402879cc9eb45c2e).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203660671
LGTM @MLnick ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165817271
**[Test build #48009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48009/consoleFull)** for PR 10355 at commit [`67162db`](https://github.com/apache/spark/commit/67162db8c0d27a5ce02ed19190306362bd3de710).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165390848
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173311100
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-173344031
Failing tests do not seem related to the changes introduced, triggering a new build.
`org.apache.spark.sql.hive.MetastoreDataSourcesSuite.insert into a table`
`org.apache.spark.sql.hive.MetastoreDataSourcesSuite.SPARK-8156:create table to specific database by 'use dbname'`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r55806405
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
@@ -151,7 +151,7 @@ abstract class NumericType extends AtomicType {
}
-private[sql] object NumericType extends AbstractDataType {
+private[spark] object NumericType extends AbstractDataType {
--- End diff --
I don't think it's required - see https://github.com/apache/spark/pull/9777/files#diff-a364fbaaec2088b49d5d1ceaea4a1c41R73 where OneHotEncoder uses `isInstanceOf[NumericType]` without making this object `private[spark]`. Here `NumericType` refers to the abstract class [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala#L144).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-198039870
**[Test build #53458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53458/consoleFull)** for PR 10355 at commit [`e8a56d5`](https://github.com/apache/spark/commit/e8a56d5927b4652ac0b89fb3783a495a239d0eae).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r56465357
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -49,10 +50,7 @@ private[ml] trait PredictorParams extends Params
validateParams()
// TODO: Support casting Array[Double] and Array[Float] to Vector when FeaturesType = Vector
SchemaUtils.checkColumnType(schema, $(featuresCol), featuresDataType)
- if (fitting) {
- // TODO: Allow other numeric types
- SchemaUtils.checkColumnType(schema, $(labelCol), DoubleType)
- }
+ if (fitting) SchemaUtils.checkNumericType(schema, $(labelCol))
--- End diff --
Spark style prefers to keep the braces format for if statements. Can you change it back please?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-203103291
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54458/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...
Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r47933679
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
@@ -100,6 +101,40 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext {
Utils.deleteRecursively(tempDir)
}
+ test("GradientBoostedTrees should support all NumericType labels") {
+ val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
--- End diff --
I'm wondering if there is some way to reuse the code here since we need to implement roughly the same test (train with all different label column types, check that they are the same as training with double type) for each different predictor. Maybe adding something in a helper function under `MLTestingUtils`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165913205
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48024/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10355#issuecomment-165894584
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/10355#discussion_r57047541
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -27,4 +31,56 @@ object MLTestingUtils {
assert(copied.parent.uid == model.parent.uid)
assert(copied.parent == model.parent)
}
+
+ def genClassifDFWithNumericLabelCol(
+ sqlContext: SQLContext,
+ labelColName: String,
+ featuresColName: String
+ ): Map[NumericType, DataFrame] = {
--- End diff --
nit: Spark style seems to be to put the return type on the line above, if it will fit. So:
```scala
def genClassifDFWithNumericLabelCol(
sqlContext: SQLContext,
labelColName: String,
featuresColName: String): Map[NumericType, DataFrame] = {
```
[Example](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala#L168)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org