You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BenFradet <gi...@git.apache.org> on 2015/12/17 10:10:16 UTC

[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

GitHub user BenFradet opened a pull request:

    https://github.com/apache/spark/pull/10355

    [SPARK-7425] [ML] [WIP] spark.ml Predictor should support other numeric types for label

    Currently, the Predictor abstraction expects the input labelCol type to be DoubleType, but we should support other numeric types. This will involve updating the PredictorParams.validateAndTransformSchema method.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BenFradet/spark SPARK-7425

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10355
    
----
commit 2b276a29cc6183e6b42f828631c23e74bcd4144f
Author: BenFradet <be...@gmail.com>
Date:   2015-12-13T15:41:57Z

    check label data type for numeric type instead of double

commit 7ef4ad4fce33f48e744fd0c9dbf30549eee76097
Author: BenFradet <be...@gmail.com>
Date:   2015-12-13T15:47:42Z

    added some cases to extractLabeledPoints, looking for a better way to handle this

commit e791ff6390284d6b0baae869de52f37bc0e38862
Author: BenFradet <be...@gmail.com>
Date:   2015-12-13T16:40:30Z

    Added a method to set the metadata on a dataframe

commit 83ffecba566f6a085fffa7dcf3194fa9f64edfc3
Author: BenFradet <be...@gmail.com>
Date:   2015-12-13T16:41:43Z

    unit tests for the decision tree classifier

commit 97dde27306984e3d78ca05d1cc99e47b18e48a8f
Author: BenFradet <be...@gmail.com>
Date:   2015-12-17T08:33:56Z

    used the sqlcontext provided with MLlibTestSparkContext

commit c68ace1d7afe815ed252bacd0cebc576ef6e06b0
Author: BenFradet <be...@gmail.com>
Date:   2015-12-17T09:06:46Z

    simpler version of extractLabeledPoints

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203641371
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203522113
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-198057662
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-200564242
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r56200539
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         assert(importances.toArray.forall(_ >= 0.0))
       }
     
    +  test("should support all NumericType labels") {
    --- End diff --
    
    I was thinking more like create a util method (or use an existing one if there is one) to generate the required test datasets. It's not a big deal though, just a "nice to have"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197581060
  
    **[Test build #53366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53366/consoleFull)** for PR 10355 at commit [`2690512`](https://github.com/apache/spark/commit/269051247d92ae2a58fb2749a3ace062ad28f9cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173322359
  
    **[Test build #49796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49796/consoleFull)** for PR 10355 at commit [`2c55bb9`](https://github.com/apache/spark/commit/2c55bb9a0af4994113784a203836ae62035cd9c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203606312
  
    **[Test build #54547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54547/consoleFull)** for PR 10355 at commit [`788eb58`](https://github.com/apache/spark/commit/788eb585a69c0997428f5da1d70630fae475ff28).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-202579805
  
    I'll take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-199709409
  
    Yup, I'll rebase tonight and start working on the related jiras.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173343330
  
    **[Test build #49796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49796/consoleFull)** for PR 10355 at commit [`2c55bb9`](https://github.com/apache/spark/commit/2c55bb9a0af4994113784a203836ae62035cd9c6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201018203
  
    **[Test build #54085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54085/consoleFull)** for PR 10355 at commit [`18b3308`](https://github.com/apache/spark/commit/18b3308d4042c4eb64dedf77f59b659fe2a041e3).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165396129
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47918/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165909357
  
    **[Test build #48023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48023/consoleFull)** for PR 10355 at commit [`1003291`](https://github.com/apache/spark/commit/10032913af7d8614ecc2c8a132f30f3e13297c3a).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165552508
  
    **[Test build #47940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47940/consoleFull)** for PR 10355 at commit [`4ada320`](https://github.com/apache/spark/commit/4ada32096ec5839d2ef822cba5d261732e0aaced).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165913202
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173343509
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173315234
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203522117
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54535/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165390845
  
    **[Test build #47917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47917/consoleFull)** for PR 10355 at commit [`c68ace1`](https://github.com/apache/spark/commit/c68ace1d7afe815ed252bacd0cebc576ef6e06b0).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165733530
  
    **[Test build #47996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47996/consoleFull)** for PR 10355 at commit [`5fd5069`](https://github.com/apache/spark/commit/5fd5069f784e067e3245e0db6b7d2de48aae84fe).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195455900
  
    **[Test build #52928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52928/consoleFull)** for PR 10355 at commit [`2575fb7`](https://github.com/apache/spark/commit/2575fb724c0b6f00ccd0ea5f6ce069ee1076bad8).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195485937
  
    **[Test build #52932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52932/consoleFull)** for PR 10355 at commit [`45582de`](https://github.com/apache/spark/commit/45582deb51a23893e61bc0155fb0c20fb5ac4e6f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201037265
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173307035
  
    **[Test build #49793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49793/consoleFull)** for PR 10355 at commit [`3c7b6c4`](https://github.com/apache/spark/commit/3c7b6c4b3cb80106a3e609f2788789682cf88150).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10355


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-204478499
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54707/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197623814
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53366/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165848903
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165748682
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47997/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57643353
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala ---
    @@ -61,6 +61,20 @@ private[spark] object SchemaUtils {
       }
     
       /**
    +    * Check whether the given schema contains a column of the numeric data type.
    --- End diff --
    
    indent 1 space less (can update IntelliJ code style settings: Editor -> Code Style -> Scala -> ScalaDoc tab -> uncheck Use scaladoc indent for leading asterisk


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57643358
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
    @@ -17,14 +17,116 @@
     
     package org.apache.spark.ml.util
     
    -import org.apache.spark.ml.Model
    +import org.apache.spark.SparkFunSuite
    +import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor}
     import org.apache.spark.ml.param.ParamMap
    +import org.apache.spark.ml.regression.Regressor
    +import org.apache.spark.ml.tree.impl.TreeTests
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.sql.{DataFrame, SQLContext}
    +import org.apache.spark.sql.functions._
    +import org.apache.spark.sql.types._
     
    -object MLTestingUtils {
    +object MLTestingUtils extends SparkFunSuite {
       def checkCopy(model: Model[_]): Unit = {
         val copied = model.copy(ParamMap.empty)
           .asInstanceOf[Model[_]]
         assert(copied.parent.uid == model.parent.uid)
         assert(copied.parent == model.parent)
       }
    +
    +  def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]](
    +      predictor: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = {
    +    val dfs = if (predictor.isInstanceOf[Regressor[_, _, _]]) {
    +      genRegressionDFWithNumericLabelCol(sqlContext)
    +    } else {
    +      genClassifDFWithNumericLabelCol(sqlContext)
    +    }
    +    val expected = predictor.fit(dfs(DoubleType))
    +    val actuals = dfs.keys.filter(_ != DoubleType).map(t => predictor.fit(dfs(t)))
    +    actuals.foreach(actual => check(expected, actual))
    +  }
    +
    +  def checkPredictorRejectNotNumericTypes(
    +      predictor: Predictor[_, _, _], sqlContext: SQLContext): Unit = {
    +    val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext)
    +    val thrown = intercept[IllegalArgumentException] {
    +      predictor.fit(dfWithStringLabels)
    +    }
    +    assert(thrown.getMessage contains
    +      "Column label must be of type NumericType but was actually of type StringType")
    +  }
    +
    +  def checkEstimatorAcceptAllNumericTypes[M <: Model[M], T <: Estimator[M]](
    +      estimator: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = {
    +    val dfs = genRegressionDFWithNumericLabelCol(sqlContext)
    +    val expected = estimator.fit(dfs(DoubleType))
    +    val actuals = dfs.keys.filter(_ != DoubleType).map(t => estimator.fit(dfs(t)))
    +    actuals.foreach(actual => check(expected, actual))
    +  }
    +
    +  def checkEstimatorRejectNotNumericTypes(
    +      predictor: Estimator[_], sqlContext: SQLContext): Unit = {
    +    val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext)
    +    val thrown = intercept[IllegalArgumentException] {
    +      predictor.fit(dfWithStringLabels)
    +    }
    +    assert(thrown.getMessage contains
    +      "Column label must be of type NumericType but was actually of type StringType")
    +  }
    +
    +  def genClassifDFWithNumericLabelCol(
    +      sqlContext: SQLContext,
    +      labelColName: String = "label",
    +      featuresColName: String = "features"): Map[NumericType, DataFrame] = {
    +    val df = sqlContext.createDataFrame(Seq(
    +      (0, Vectors.dense(0, 2, 3)),
    +      (1, Vectors.dense(0, 3, 1)),
    +      (0, Vectors.dense(0, 2, 2)),
    +      (1, Vectors.dense(0, 3, 9)),
    +      (0, Vectors.dense(0, 2, 6))
    +    )).toDF(labelColName, featuresColName)
    +
    +    val types =
    +      Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0))
    +    types.map(t => t -> df.select(col(labelColName).cast(t), col(featuresColName)))
    +      .map { case (t, d) => t -> TreeTests.setMetadata(d, 2, labelColName) }
    +      .toMap
    +  }
    +
    +  def genRegressionDFWithNumericLabelCol(
    +      sqlContext: SQLContext,
    +      labelColName: String = "label",
    +      featuresColName: String = "features",
    +      censorColName: String = "censor"): Map[NumericType, DataFrame] = {
    +    val df = sqlContext.createDataFrame(Seq(
    +      (0, Vectors.dense(0)),
    +      (1, Vectors.dense(1)),
    +      (2, Vectors.dense(2)),
    +      (3, Vectors.dense(3)),
    +      (4, Vectors.dense(4))
    +    )).toDF(labelColName, featuresColName)
    +
    +    val types =
    +      Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0))
    +    types
    +      .map(t => t -> df.select(col(labelColName).cast(t), col(featuresColName)))
    +      .map { case (t, d) =>
    +        t -> TreeTests.setMetadata(d, 2, labelColName).withColumn(censorColName, lit(0.0))
    --- End diff --
    
    setMetadata: classes should be 0, not 2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-192681503
  
    **[Test build #52519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52519/consoleFull)** for PR 10355 at commit [`dd18ddc`](https://github.com/apache/spark/commit/dd18ddc5ece97225c3b76716b49593ac7fa0be36).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165552039
  
    **[Test build #47940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47940/consoleFull)** for PR 10355 at commit [`4ada320`](https://github.com/apache/spark/commit/4ada32096ec5839d2ef822cba5d261732e0aaced).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165429666
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203522095
  
    **[Test build #54535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54535/consoleFull)** for PR 10355 at commit [`21e4afb`](https://github.com/apache/spark/commit/21e4afb9d734f81e47c73b16e1f29f76c46df6f5).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201037149
  
    **[Test build #54088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54088/consoleFull)** for PR 10355 at commit [`42d5e0f`](https://github.com/apache/spark/commit/42d5e0f28ac09c15cddb2af11c01df4b04503d4e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203617385
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54547/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195500352
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52932/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-204623657
  
    LGTM
    I'll go ahead and merge this with master before a conflict happens
    Thanks for the PR!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173375255
  
    Pinging @jkbradley and @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r56203460
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         assert(importances.toArray.forall(_ >= 0.0))
       }
     
    +  test("should support all NumericType labels") {
    --- End diff --
    
    OK, will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-198113198
  
    **[Test build #53454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53454/consoleFull)** for PR 10355 at commit [`6b8b0cf`](https://github.com/apache/spark/commit/6b8b0cfa44e0c3050c17a4bcb7be9faeaae53748).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57245082
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala ---
    @@ -183,12 +183,12 @@ class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override val uid: S
        * Extract [[featuresCol]], [[labelCol]] and [[censorCol]] from input dataset,
        * and put it in an RDD with strong types.
        */
    -  protected[ml] def extractAFTPoints(dataset: DataFrame): RDD[AFTPoint] = {
    -    dataset.select($(featuresCol), $(labelCol), $(censorCol)).rdd.map {
    -      case Row(features: Vector, label: Double, censor: Double) =>
    -        AFTPoint(features, label, censor)
    -    }
    -  }
    +  protected[ml] def extractAFTPoints(dataset: DataFrame): RDD[AFTPoint] =
    --- End diff --
    
    I prefer to keep braces around multiline methods like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197623583
  
    **[Test build #53366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53366/consoleFull)** for PR 10355 at commit [`2690512`](https://github.com/apache/spark/commit/269051247d92ae2a58fb2749a3ace062ad28f9cb).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-194861966
  
    Great! Thanks.
    I'll fix those conflicts later today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203102318
  
    **[Test build #54458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54458/consoleFull)** for PR 10355 at commit [`cf005af`](https://github.com/apache/spark/commit/cf005af19eb52037bc32905c5d30595681f68152).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57057621
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
    @@ -27,4 +31,56 @@ object MLTestingUtils {
         assert(copied.parent.uid == model.parent.uid)
         assert(copied.parent == model.parent)
       }
    +
    +  def genClassifDFWithNumericLabelCol(
    +    sqlContext: SQLContext,
    +    labelColName: String,
    +    featuresColName: String
    +  ): Map[NumericType, DataFrame] = {
    --- End diff --
    
    I'll modify the whole file then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165848904
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48009/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-193294590
  
    @mengxr @jkbradley is there any interest in this pr or should i close it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r47933370
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -275,6 +274,40 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         val model = dt.fit(df)
       }
     
    +  test("DecisionTree should support all NumericType labels") {
    +    val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
    --- End diff --
    
    I think so too, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165389989
  
    This is still a wip but remarks are welcome.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-198057663
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53458/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165909442
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48023/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57045359
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
    @@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         assert(importances.toArray.forall(_ >= 0.0))
       }
     
    +  test("should support all NumericType labels") {
    +    val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features")
    +
    +    val rf = new RandomForestClassifier().setFeaturesCol("features")
    +
    +    val expected = rf.setLabelCol("label")
    +      .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label"))
    +    dfs.keys.filter(_ != DoubleType).foreach { t =>
    +      TreeTests.checkEqual(expected,
    +        rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label")))
    +    }
    +  }
    +
    +  test("shouldn't support non NumericType labels") {
    --- End diff --
    
    I still think the code duplication is a big concern. It might not be that hard to add some utility function that can clean this up. For example I was able to get this working:
    
    In _MLTestingUtils.scala_
    ```scala
    def checkRejectsNonNumericType(est: Predictor[_, _, _], sqlContext: SQLContext) = {
        val dfWithStringLabels =
          MLTestingUtils.generateDFWithStringLabelCol(sqlContext, est.getLabelCol, est.getFeaturesCol)
    
        val thrown = intercept[IllegalArgumentException] {
          est.fit(dfWithStringLabels)
        }
        assert(thrown.getMessage contains
          "Column label must be of type NumericType but was actually of type StringType")
      }
    ```
    
    Then in the actual test suites:
    ```scala
    test("shouldn't support non NumericType labels") {
        MLTestingUtils.checkRejectsNonNumericType(new RandomForestClassifier, sqlContext)
    }
    ```
    There will be some complication for some of the estimators like OneVsRest and IsotonicRegression (why doesn't it extend Predictor?) but I think it will be worth it to get this right since future estimators will have to implement this as well. It would be really nice to have something like we do for params where there is just a one-liner `ParamsSuite.checkParams(new RandomForestClassifier)`. For testing that all numeric types are supported, we could have a utility method that produces all actual and expected results, then check for equality inside the individual test suites, like:
    
    ```scala
    val models = MLTestingUtils.fitAllNumericTypes(new NaiveBayes, sqlContext)
        models.foreach { case (expected, actual) =>
          assert(expected.pi === actual.pi)
          assert(expected.theta === actual.theta)
        }
    ```
    
    I'm open to feedback on this, realizing it could be hard to generalize this for every case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173375292
  
    Pinging @jkbradley and @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195557577
  
    @MLnick you were right, thanks for spotting that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197596345
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-198035588
  
    **[Test build #53454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53454/consoleFull)** for PR 10355 at commit [`6b8b0cf`](https://github.com/apache/spark/commit/6b8b0cfa44e0c3050c17a4bcb7be9faeaae53748).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165426246
  
    **[Test build #47919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47919/consoleFull)** for PR 10355 at commit [`b014357`](https://github.com/apache/spark/commit/b0143575b50567225a63dabfef560a5d35554077).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-202603261
  
    This looks to be in very good shape.  My main comment is about unit test time: the tests take a significant amount of time.  Wherever it is easy to do, could you set the algorithm parameters to make for faster tests?  E.g., use 1 iteration for LogisticRegression, use maxDepth 0 for trees, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-204478494
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197755584
  
    @MLnick yup thanks, I'll get to those once I'm done with this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195473833
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173311106
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49793/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-199708780
  
    Looks fine to me - latest test result indicates there may be merge conflicts?
    
    I'd like @jkbradley to take a quick pass too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201750481
  
    @jkbradley @sethah @MLnick I think it's in a pretty good state now, what do you guys think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165396122
  
    **[Test build #47918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47918/consoleFull)** for PR 10355 at commit [`1027e83`](https://github.com/apache/spark/commit/1027e83d9add7a91adda7bba892b3fe7926405f1).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165909440
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201582826
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54229/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165401216
  
    **[Test build #47919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47919/consoleFull)** for PR 10355 at commit [`b014357`](https://github.com/apache/spark/commit/b0143575b50567225a63dabfef560a5d35554077).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165552519
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47940/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165748680
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165426698
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r48050215
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
    @@ -160,6 +160,59 @@ class OneVsRestSuite extends SparkFunSuite with MLlibTestSparkContext {
           require(m.getThreshold === 0.1, "copy should handle extra model params")
         }
       }
    +
    +  test("should support all NumericType labels") {
    +    val df = sqlContext.createDataFrame(Seq(
    +      (0, Vectors.dense(0, 2, 3)),
    +      (1, Vectors.dense(0, 3, 1)),
    +      (0, Vectors.dense(0, 2, 2)),
    +      (1, Vectors.dense(0, 3, 9)),
    +      (2, Vectors.dense(0, 2, 6))
    +    )).toDF("label", "features")
    +
    +    val types =
    +      Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0))
    +
    +    var dfWithTypes = df
    +    types.foreach(t => dfWithTypes = dfWithTypes.withColumn(t.toString, df("label").cast(t)))
    +
    +    val ovr = new OneVsRest()
    +      .setClassifier(new LogisticRegression)
    +      .setFeaturesCol("features")
    +
    +    val expected = ovr.setLabelCol(DoubleType.toString).fit(dfWithTypes)
    +    val expectedModels = expected.models.map(m => m.asInstanceOf[LogisticRegressionModel])
    +    types.filter(_ != DoubleType).foreach { t =>
    +      val actual = ovr.setLabelCol(t.toString).fit(dfWithTypes)
    +      val actualModels = actual.models.map(m => m.asInstanceOf[LogisticRegressionModel])
    +      assert(expectedModels.length === actualModels.length)
    +      expectedModels.zip(actualModels).foreach { case (e, a) =>
    +        assert(e.intercept === a.intercept)
    +        assert(e.coefficients.toArray === a.coefficients.toArray)
    +      }
    +    }
    +  }
    +
    +  test("shouldn't support non NumericType labels") {
    +    val dfWithStringLabels = sqlContext.createDataFrame(Seq(
    +      ("0", Vectors.dense(0, 2, 3)),
    +      ("1", Vectors.dense(0, 3, 1)),
    +      ("0", Vectors.dense(0, 2, 2)),
    +      ("1", Vectors.dense(0, 3, 9)),
    +      ("2", Vectors.dense(0, 2, 6))
    +    )).toDF("label", "features")
    +
    +    val ovr = new OneVsRest()
    +      .setClassifier(new LogisticRegression)
    +      .setLabelCol("label")
    +      .setFeaturesCol("features")
    +
    +    // thrown by AttributeFactory#fromStructField and not by Predictor#validateAndTransformSchema
    +    // because OneVsRest reimplements the fit method
    +    intercept[IllegalArgumentException] {
    --- End diff --
    
    This is caused by [AttributeFactory#fromStructField](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/attribute/attributes.scala#L130).
    
    I don't know what's the best way to handle this, input welcome.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195500349
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173374838
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49805/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165896927
  
    **[Test build #48024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48024/consoleFull)** for PR 10355 at commit [`46ca0a7`](https://github.com/apache/spark/commit/46ca0a7afa29bc39b0ee408f72f8c72fa1f91866).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r56199972
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         assert(importances.toArray.forall(_ >= 0.0))
       }
     
    +  test("should support all NumericType labels") {
    --- End diff --
    
    The code duplication is mostly related to the test datasets.
    What I can do is reduce their size to one or two observations, what do you think?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197623813
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173314806
  
    **[Test build #49794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49794/consoleFull)** for PR 10355 at commit [`2c3b4cf`](https://github.com/apache/spark/commit/2c3b4cf22e4b7bcf109669cb7b56f39a82d8cf26).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-199967566
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r56136601
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -334,6 +334,51 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         assert(importances.toArray.forall(_ >= 0.0))
       }
     
    +  test("should support all NumericType labels") {
    --- End diff --
    
    It's not a major issue, but there is a large amount of code duplication in all the tests. I know you've already tried to pare it down, but if it's possible to further reduce it by extracting out some methods, that would be cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57796113
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
    @@ -17,14 +17,116 @@
     
     package org.apache.spark.ml.util
     
    -import org.apache.spark.ml.Model
    +import org.apache.spark.SparkFunSuite
    +import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor}
     import org.apache.spark.ml.param.ParamMap
    +import org.apache.spark.ml.regression.Regressor
    +import org.apache.spark.ml.tree.impl.TreeTests
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.sql.{DataFrame, SQLContext}
    +import org.apache.spark.sql.functions._
    +import org.apache.spark.sql.types._
     
    -object MLTestingUtils {
    +object MLTestingUtils extends SparkFunSuite {
       def checkCopy(model: Model[_]): Unit = {
         val copied = model.copy(ParamMap.empty)
           .asInstanceOf[Model[_]]
         assert(copied.parent.uid == model.parent.uid)
         assert(copied.parent == model.parent)
       }
    +
    +  def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]](
    --- End diff --
    
    good point, will update


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197596347
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53368/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173315239
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49794/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165395169
  
    **[Test build #47918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47918/consoleFull)** for PR 10355 at commit [`1027e83`](https://github.com/apache/spark/commit/1027e83d9add7a91adda7bba892b3fe7926405f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165893915
  
    **[Test build #48023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48023/consoleFull)** for PR 10355 at commit [`1003291`](https://github.com/apache/spark/commit/10032913af7d8614ecc2c8a132f30f3e13297c3a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195473729
  
    **[Test build #52929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52929/consoleFull)** for PR 10355 at commit [`2230292`](https://github.com/apache/spark/commit/2230292fb7c31c6487b5c5cbfd5a5b4554db15f3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r56468850
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala ---
    @@ -347,6 +348,34 @@ class AFTSurvivalRegressionSuite
         }
       }
     
    +  test("should support all NumericType labels") {
    +    val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features")
    --- End diff --
    
    Thanks for the insight, I'll fix this later today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-192701255
  
    **[Test build #52519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52519/consoleFull)** for PR 10355 at commit [`dd18ddc`](https://github.com/apache/spark/commit/dd18ddc5ece97225c3b76716b49593ac7fa0be36).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197744650
  
    @BenFradet I linked 2 additional sub-tasks to the [umbrella JIRA](https://issues.apache.org/jira/browse/SPARK-11107)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-167002913
  
    Also, what about the other things expecting a `labelCol` of `DoubleType` like the different evaluators?
    Should this be handled in a follow-up jira/pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201582825
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-192701356
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-204478315
  
    **[Test build #54707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54707/consoleFull)** for PR 10355 at commit [`718774b`](https://github.com/apache/spark/commit/718774b7401b86a99c2c8ae02014470b0d3a77a3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-202744058
  
    Sure, thanks for your review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165748572
  
    **[Test build #47997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47997/consoleFull)** for PR 10355 at commit [`0c6feab`](https://github.com/apache/spark/commit/0c6feab71fecafd44953232c0f99534cc893e9e0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173315228
  
    **[Test build #49794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49794/consoleFull)** for PR 10355 at commit [`2c3b4cf`](https://github.com/apache/spark/commit/2c3b4cf22e4b7bcf109669cb7b56f39a82d8cf26).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57241509
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
    @@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         assert(importances.toArray.forall(_ >= 0.0))
       }
     
    +  test("should support all NumericType labels") {
    +    val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features")
    +
    +    val rf = new RandomForestClassifier().setFeaturesCol("features")
    +
    +    val expected = rf.setLabelCol("label")
    +      .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label"))
    +    dfs.keys.filter(_ != DoubleType).foreach { t =>
    +      TreeTests.checkEqual(expected,
    +        rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label")))
    +    }
    +  }
    +
    +  test("shouldn't support non NumericType labels") {
    --- End diff --
    
    Here's what I managed to make work:
    
    ```scala
      def checkNumericTypes[M <: RegressionModel[_, M], T <: Regressor[_, _, M]](
          regressor: T, sqlContext: SQLContext)(check: (M, M) => Unit): Unit = {
        val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features")
        
        val expected = regressor.fit(dfs(DoubleType))
        val actuals = dfs.keys.filter(_ != DoubleType).map(t => regressor.fit(dfs(t)))
        actuals.foreach(actual => check(expected, actual))
      }
    ```
    
    which could be used like:
    
    ```scala
    MLTestingUtils.checkNumericTypes[LinearRegressionModel, LinearRegression](
        new LinearRegression(), sqlContext) { (expected, actual) =>
          assert(expected.intercept === actual.intercept)
          assert(expected.coefficients === actual.coefficients)
        }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201018208
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201560326
  
    **[Test build #54229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54229/consoleFull)** for PR 10355 at commit [`f8b7823`](https://github.com/apache/spark/commit/f8b7823e246c1ea9ee7729971fdc1d3a64ac8208).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203617375
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-194856518
  
    @BenFradet I'll review - there seem to be merge conflicts that need resolving.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203103287
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165819097
  
    I noticed [AFTSurvivalRegression](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L99) uses its own `validateAndTransformSchema` method.
    
    Should I make it support other numeric types in this PR or in another one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165429356
  
    **[Test build #47921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47921/consoleFull)** for PR 10355 at commit [`47cc81b`](https://github.com/apache/spark/commit/47cc81bdd84146587ba169c80c8fc1ef382dafe3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165405712
  
    **[Test build #47921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47921/consoleFull)** for PR 10355 at commit [`47cc81b`](https://github.com/apache/spark/commit/47cc81bdd84146587ba169c80c8fc1ef382dafe3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r56465970
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala ---
    @@ -347,6 +348,34 @@ class AFTSurvivalRegressionSuite
         }
       }
     
    +  test("should support all NumericType labels") {
    +    val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features")
    --- End diff --
    
    I think you'll need to adjust the generated data here as it requires an additional 'censor' col.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r47933490
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -275,6 +274,40 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         val model = dt.fit(df)
       }
     
    +  test("DecisionTree should support all NumericType labels") {
    +    val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
    +      (0, Vectors.dense(0, 2, 3)),
    +      (1, Vectors.dense(0, 3, 1)),
    +      (0, Vectors.dense(0, 2, 2)),
    +      (1, Vectors.dense(0, 3, 9)),
    +      (0, Vectors.dense(0, 2, 6))
    +    )).toDF("label", "features"), 2)
    +
    +    val dfWithFloatLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
    +      (0f, Vectors.dense(0, 2, 3)),
    +      (1f, Vectors.dense(0, 3, 1)),
    +      (0f, Vectors.dense(0, 2, 2)),
    +      (1f, Vectors.dense(0, 3, 9)),
    +      (0f, Vectors.dense(0, 2, 6))
    +    )).toDF("label", "features"), 2)
    +
    +    val dfWithLongLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
    +      (0L, Vectors.dense(0, 2, 3)),
    +      (1L, Vectors.dense(0, 3, 1)),
    +      (0L, Vectors.dense(0, 2, 2)),
    +      (1L, Vectors.dense(0, 3, 9)),
    +      (0L, Vectors.dense(0, 2, 6))
    +    )).toDF("label", "features"), 2)
    +
    +    val dt = new DecisionTreeClassifier()
    +      .setLabelCol("label")
    +      .setFeaturesCol("features")
    +
    +    dt.fit(dfWithIntLabels)
    +    dt.fit(dfWithFloatLabels)
    +    dt.fit(dfWithLongLabels)
    +  }
    --- End diff --
    
    I'm not sure it's not clear what this test is checking for. It looks like it just checks that no errors are thrown during training the tree. Maybe we should train a tree with a data frame with `DoubleType` as the column and check equality between training trees with other label column types?
    
    ```scala
    val doubleModel = dt.fit(dfWithDoubleLabels)
    val intModel = dt.fit(dfWithIntLabels)
    TreeTests.checkEqual(doubleModel, intModel)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-200556843
  
    Sorry for the long delay!  I'll take a look now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r55686938
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
    @@ -151,7 +151,7 @@ abstract class NumericType extends AtomicType {
     }
     
     
    -private[sql] object NumericType extends AbstractDataType {
    +private[spark] object NumericType extends AbstractDataType {
    --- End diff --
    
    Because it's now used in [Predictor.scala]{https://github.com/apache/spark/pull/10355/files#diff-5fe763c1d5d2d36c0154204028cd3585R56}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173374837
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195463826
  
    **[Test build #52929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52929/consoleFull)** for PR 10355 at commit [`2230292`](https://github.com/apache/spark/commit/2230292fb7c31c6487b5c5cbfd5a5b4554db15f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-196743025
  
    @BenFradet I'm ok with doing the other things like evaluators in a different PR - however the only concern I'd have is it would temporarily break things like `CrossValidator` that use evaluators and a non-Double `labelCol` for the model. It's ok if you're able to do the follow up soon. Otherwise it may be worth thinking about just doing them here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195499950
  
    **[Test build #52932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52932/consoleFull)** for PR 10355 at commit [`45582de`](https://github.com/apache/spark/commit/45582deb51a23893e61bc0155fb0c20fb5ac4e6f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173374354
  
    **[Test build #49805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49805/consoleFull)** for PR 10355 at commit [`2c55bb9`](https://github.com/apache/spark/commit/2c55bb9a0af4994113784a203836ae62035cd9c6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-200726627
  
    @jkbradley yes, I'm currently trying to incorporate @sethah's comments and keep code duplication to a strict minimum.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197035874
  
    **[Test build #53227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53227/consoleFull)** for PR 10355 at commit [`7d9835a`](https://github.com/apache/spark/commit/7d9835abd0f3bfac2ef66db01a4de4fdc7dceb48).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173343511
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49796/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201037272
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54088/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197039251
  
    **[Test build #53227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53227/consoleFull)** for PR 10355 at commit [`7d9835a`](https://github.com/apache/spark/commit/7d9835abd0f3bfac2ef66db01a4de4fdc7dceb48).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203641203
  
    **[Test build #54549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54549/consoleFull)** for PR 10355 at commit [`58290af`](https://github.com/apache/spark/commit/58290af3c888f8d6a600ca4676488d4b542e04e7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203641373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54549/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165720858
  
    **[Test build #47996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47996/consoleFull)** for PR 10355 at commit [`5fd5069`](https://github.com/apache/spark/commit/5fd5069f784e067e3245e0db6b7d2de48aae84fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165517939
  
    I assume since this is a WIP you are still going to add test cases for the other predictors? Additionally, since ShortType and DecimalType also extend NumericType, I think the tests should include those cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165396126
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57768696
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
    @@ -17,14 +17,116 @@
     
     package org.apache.spark.ml.util
     
    -import org.apache.spark.ml.Model
    +import org.apache.spark.SparkFunSuite
    +import org.apache.spark.ml.{Estimator, Model, PredictionModel, Predictor}
     import org.apache.spark.ml.param.ParamMap
    +import org.apache.spark.ml.regression.Regressor
    +import org.apache.spark.ml.tree.impl.TreeTests
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.sql.{DataFrame, SQLContext}
    +import org.apache.spark.sql.functions._
    +import org.apache.spark.sql.types._
     
    -object MLTestingUtils {
    +object MLTestingUtils extends SparkFunSuite {
       def checkCopy(model: Model[_]): Unit = {
         val copied = model.copy(ParamMap.empty)
           .asInstanceOf[Model[_]]
         assert(copied.parent.uid == model.parent.uid)
         assert(copied.parent == model.parent)
       }
    +
    +  def checkPredictorAcceptAllNumericTypes[M <: PredictionModel[_, M], T <: Predictor[_, _, M]](
    --- End diff --
    
    Is there a reason to have two separate methods for predictor and estimator? I also tend to prefer _not_ separating the "accept" and "reject" checks, and instead have them in a single function call. This will reduce the complexity of having to decide which one to use (and why) and also reduce the number of function calls in future tests. I was able to get the tests working having a single method for accept/reject and estimator/predictor, like this:
    
    ```scala
      def checkEstimatorNumericTypes[M <: Model[M], T <: Estimator[M]](
          predictor: T,
          isClassification: Boolean,
          sqlContext: SQLContext)(check: (M, M) => Unit): Unit = {
        val dfs = if (isClassification) {
          genClassifDFWithNumericLabelCol(sqlContext)
        } else {
          genRegressionDFWithNumericLabelCol(sqlContext)
        }
        val expected = predictor.fit(dfs(DoubleType))
        val actuals = dfs.keys.filter(_ != DoubleType).map(t => predictor.fit(dfs(t)))
        actuals.foreach(actual => check(expected, actual))
    
        val dfWithStringLabels = generateDFWithStringLabelCol(sqlContext)
        val thrown = intercept[IllegalArgumentException] {
          predictor.fit(dfWithStringLabels)
        }
        assert(thrown.getMessage contains
          "Column label must be of type NumericType but was actually of type StringType")
      }
    ```
    
    I had to change the fit method of OneVsRest to include a call to `transformSchema(dataset.schema)` so that it would fail properly on non-numeric label column. I think it's better to change the fit method so that all ML estimators provide consistent errors. Also note that the `isClassification` flag is needed because MLPClassifier does not inherit from Classifier, and so it is not easy to infer that it is a classification estimator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57219723
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
    @@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         assert(importances.toArray.forall(_ >= 0.0))
       }
     
    +  test("should support all NumericType labels") {
    +    val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features")
    +
    +    val rf = new RandomForestClassifier().setFeaturesCol("features")
    +
    +    val expected = rf.setLabelCol("label")
    +      .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label"))
    +    dfs.keys.filter(_ != DoubleType).foreach { t =>
    +      TreeTests.checkEqual(expected,
    +        rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label")))
    +    }
    +  }
    +
    +  test("shouldn't support non NumericType labels") {
    --- End diff --
    
    I wanted to be able to do something like:
    
    ```scala
    def checkNumericTypes[T <: Regressor, M <: Model](
        regressor: T, sqlContext: SQLContext)(asserts: Seq[(M, M) => Unit]): Unit = {
      val dfs = MLTestingUtils.genRegressionDFWithNumericLabelCol(sqlContext, "label", "features")
        
      val expected = regressor.fit(dfs(DoubleType))
      dfs.keys.filter(_ != DoubleType).foreach { t =>
        val actual = regressor.fit(dfs(t))
        asserts.foreach(f => f(expected, actual))
      }
    }
    ```
    
    But unfortunately that doesn't work since I'm not able to capture which model is being created by the regressor's `fit` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165519097
  
    @sethah Yup, I plan to make it exhaustive but it'll take a bit of time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r58168089
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
    @@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.LabeledPoint
     import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.{DataFrame, Row}
     import org.apache.spark.sql.functions._
    -import org.apache.spark.sql.types.{DataType, DoubleType, StructType}
    +import org.apache.spark.sql.types.{DataType, DoubleType, NumericType, StructType}
    --- End diff --
    
    minor, but I believe the `NumericType` import is no longer necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-204275787
  
    One very minor comment on imports, and think there are now merge conflicts with latest master that need fixing, subject to those LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-199709002
  
    @sethah could you also take a final pass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-204279426
  
    Will fix that, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-200564217
  
    **[Test build #53974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53974/consoleFull)** for PR 10355 at commit [`a88c4a3`](https://github.com/apache/spark/commit/a88c4a33f361c0967fa7e4f0ebf086f62a6460cb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195455908
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201017850
  
    **[Test build #54085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54085/consoleFull)** for PR 10355 at commit [`18b3308`](https://github.com/apache/spark/commit/18b3308d4042c4eb64dedf77f59b659fe2a041e3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-199967218
  
    **[Test build #53797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53797/consoleFull)** for PR 10355 at commit [`b2f4f51`](https://github.com/apache/spark/commit/b2f4f518e41e463d679dd7f8266b150674a2e138).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165726765
  
    **[Test build #47997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47997/consoleFull)** for PR 10355 at commit [`0c6feab`](https://github.com/apache/spark/commit/0c6feab71fecafd44953232c0f99534cc893e9e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165429667
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47921/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165426703
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47919/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173311071
  
    **[Test build #49793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49793/consoleFull)** for PR 10355 at commit [`3c7b6c4`](https://github.com/apache/spark/commit/3c7b6c4b3cb80106a3e609f2788789682cf88150).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r47933061
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -275,6 +274,40 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         val model = dt.fit(df)
       }
     
    +  test("DecisionTree should support all NumericType labels") {
    +    val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
    --- End diff --
    
    It might be less verbose to create the dataframe once, and then add the other label column types to the same data frame. Something like:
    
    ```scala
    val dfWithTypes = df
          .withColumn("shortLabel", df("labelIndex").cast(ShortType))
          .withColumn("longLabel", df("labelIndex").cast(LongType))
          .withColumn("intLabel", df("labelIndex").cast(IntegerType))
          .withColumn("floatLabel", df("labelIndex").cast(FloatType))
          .withColumn("decimalLabel", df("labelIndex").cast(DecimalType(10, 0)))
    ```
    
    Then just change the label column between training. I'm not sure which way is better, but this would reduce copying the code ~5 times per test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-199967576
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53797/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203617194
  
    **[Test build #54547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54547/consoleFull)** for PR 10355 at commit [`788eb58`](https://github.com/apache/spark/commit/788eb585a69c0997428f5da1d70630fae475ff28).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197039258
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195455108
  
    **[Test build #52928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52928/consoleFull)** for PR 10355 at commit [`2575fb7`](https://github.com/apache/spark/commit/2575fb724c0b6f00ccd0ea5f6ce069ee1076bad8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173344090
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173347952
  
    **[Test build #49805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49805/consoleFull)** for PR 10355 at commit [`2c55bb9`](https://github.com/apache/spark/commit/2c55bb9a0af4994113784a203836ae62035cd9c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203627025
  
    **[Test build #54549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54549/consoleFull)** for PR 10355 at commit [`58290af`](https://github.com/apache/spark/commit/58290af3c888f8d6a600ca4676488d4b542e04e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165913085
  
    **[Test build #48024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48024/consoleFull)** for PR 10355 at commit [`46ca0a7`](https://github.com/apache/spark/commit/46ca0a7afa29bc39b0ee408f72f8c72fa1f91866).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165913300
  
    cc @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201021316
  
    **[Test build #54088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54088/consoleFull)** for PR 10355 at commit [`42d5e0f`](https://github.com/apache/spark/commit/42d5e0f28ac09c15cddb2af11c01df4b04503d4e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-196742174
  
    Surething


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165390849
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47917/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165848753
  
    **[Test build #48009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48009/consoleFull)** for PR 10355 at commit [`67162db`](https://github.com/apache/spark/commit/67162db8c0d27a5ce02ed19190306362bd3de710).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197039263
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53227/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-198057482
  
    **[Test build #53458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53458/consoleFull)** for PR 10355 at commit [`e8a56d5`](https://github.com/apache/spark/commit/e8a56d5927b4652ac0b89fb3783a495a239d0eae).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-198113445
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53454/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-200559906
  
    **[Test build #53974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53974/consoleFull)** for PR 10355 at commit [`a88c4a3`](https://github.com/apache/spark/commit/a88c4a33f361c0967fa7e4f0ebf086f62a6460cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195455913
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52928/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165733583
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47996/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57057748
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
    @@ -178,6 +179,34 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         assert(importances.toArray.forall(_ >= 0.0))
       }
     
    +  test("should support all NumericType labels") {
    +    val dfs = MLTestingUtils.genClassifDFWithNumericLabelCol(sqlContext, "label", "features")
    +
    +    val rf = new RandomForestClassifier().setFeaturesCol("features")
    +
    +    val expected = rf.setLabelCol("label")
    +      .fit(TreeTests.setMetadata(dfs(DoubleType), 2, "label"))
    +    dfs.keys.filter(_ != DoubleType).foreach { t =>
    +      TreeTests.checkEqual(expected,
    +        rf.setLabelCol("label").fit(TreeTests.setMetadata(dfs(t), 2, "label")))
    +    }
    +  }
    +
    +  test("shouldn't support non NumericType labels") {
    --- End diff --
    
    Yup, good idea, I'll investigate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-196741326
  
    @BenFradet if it's not too much trouble, I think you can do `AFTSurvivalRegression` and `IsotonicRegression` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203521461
  
    **[Test build #54535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54535/consoleFull)** for PR 10355 at commit [`21e4afb`](https://github.com/apache/spark/commit/21e4afb9d734f81e47c73b16e1f29f76c46df6f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-200564243
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53974/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197537090
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-198113442
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203103276
  
    **[Test build #54458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54458/consoleFull)** for PR 10355 at commit [`cf005af`](https://github.com/apache/spark/commit/cf005af19eb52037bc32905c5d30595681f68152).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-192701357
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52519/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165390564
  
    **[Test build #47917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47917/consoleFull)** for PR 10355 at commit [`c68ace1`](https://github.com/apache/spark/commit/c68ace1d7afe815ed252bacd0cebc576ef6e06b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197037079
  
    Here's what's left to do:
    - IsotonicRegression
    - GeneralizedLinearRegression
    - AFTSurvival
    
    Otherwise, I'd prefer doing the evaluators and other estimators (like rformula and chiSqSelector) in other prs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-204676362
  
    No problem, thanks for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-199948244
  
    **[Test build #53797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53797/consoleFull)** for PR 10355 at commit [`b2f4f51`](https://github.com/apache/spark/commit/b2f4f518e41e463d679dd7f8266b150674a2e138).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197596308
  
    **[Test build #53368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53368/consoleFull)** for PR 10355 at commit [`3b424dc`](https://github.com/apache/spark/commit/3b424dc88211c83615ef8f16402879cc9eb45c2e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201582376
  
    **[Test build #54229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54229/consoleFull)** for PR 10355 at commit [`f8b7823`](https://github.com/apache/spark/commit/f8b7823e246c1ea9ee7729971fdc1d3a64ac8208).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-200561614
  
    @BenFradet Are you actively working on this?  if so, I'll wait to review it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-201018209
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54085/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165552516
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165733582
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165894586
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48022/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197537016
  
    **[Test build #53343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53343/consoleFull)** for PR 10355 at commit [`9feca44`](https://github.com/apache/spark/commit/9feca44e2796baaa9ccf5dfb03e2b9a0eabb731b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r47932753
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -275,6 +274,40 @@ class DecisionTreeClassifierSuite extends SparkFunSuite with MLlibTestSparkConte
         val model = dt.fit(df)
       }
     
    +  test("DecisionTree should support all NumericType labels") {
    +    val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
    +      (0, Vectors.dense(0, 2, 3)),
    --- End diff --
    
    It might be less verbose to create the dataframe once and then add different types of label columns. Something like:



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-204465968
  
    **[Test build #54707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54707/consoleFull)** for PR 10355 at commit [`718774b`](https://github.com/apache/spark/commit/718774b7401b86a99c2c8ae02014470b0d3a77a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197537092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53343/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r55680258
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
    @@ -151,7 +151,7 @@ abstract class NumericType extends AtomicType {
     }
     
     
    -private[sql] object NumericType extends AbstractDataType {
    +private[spark] object NumericType extends AbstractDataType {
    --- End diff --
    
    Why is this change required?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-195473838
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52929/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197521550
  
    **[Test build #53343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53343/consoleFull)** for PR 10355 at commit [`9feca44`](https://github.com/apache/spark/commit/9feca44e2796baaa9ccf5dfb03e2b9a0eabb731b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-197586274
  
    **[Test build #53368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53368/consoleFull)** for PR 10355 at commit [`3b424dc`](https://github.com/apache/spark/commit/3b424dc88211c83615ef8f16402879cc9eb45c2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203660671
  
    LGTM  @MLnick ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165817271
  
    **[Test build #48009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48009/consoleFull)** for PR 10355 at commit [`67162db`](https://github.com/apache/spark/commit/67162db8c0d27a5ce02ed19190306362bd3de710).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165390848
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173311100
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-173344031
  
    Failing tests do not seem related to the changes introduced, triggering a new build.
    `org.apache.spark.sql.hive.MetastoreDataSourcesSuite.insert into a table`
    `org.apache.spark.sql.hive.MetastoreDataSourcesSuite.SPARK-8156:create table to specific database by 'use dbname'`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r55806405
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
    @@ -151,7 +151,7 @@ abstract class NumericType extends AtomicType {
     }
     
     
    -private[sql] object NumericType extends AbstractDataType {
    +private[spark] object NumericType extends AbstractDataType {
    --- End diff --
    
    I don't think it's required - see https://github.com/apache/spark/pull/9777/files#diff-a364fbaaec2088b49d5d1ceaea4a1c41R73 where OneHotEncoder uses `isInstanceOf[NumericType]` without making this object `private[spark]`. Here `NumericType` refers to the abstract class [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala#L144).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-198039870
  
    **[Test build #53458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53458/consoleFull)** for PR 10355 at commit [`e8a56d5`](https://github.com/apache/spark/commit/e8a56d5927b4652ac0b89fb3783a495a239d0eae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r56465357
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
    @@ -49,10 +50,7 @@ private[ml] trait PredictorParams extends Params
         validateParams()
         // TODO: Support casting Array[Double] and Array[Float] to Vector when FeaturesType = Vector
         SchemaUtils.checkColumnType(schema, $(featuresCol), featuresDataType)
    -    if (fitting) {
    -      // TODO: Allow other numeric types
    -      SchemaUtils.checkColumnType(schema, $(labelCol), DoubleType)
    -    }
    +    if (fitting) SchemaUtils.checkNumericType(schema, $(labelCol))
    --- End diff --
    
    Spark style prefers to keep the braces format for if statements. Can you change it back please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-203103291
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54458/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] [WIP] spark.ml Predictor sho...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r47933679
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
    @@ -100,6 +101,40 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext {
         Utils.deleteRecursively(tempDir)
       }
     
    +  test("GradientBoostedTrees should support all NumericType labels") {
    +    val dfWithIntLabels = TreeTests.setMetadata(sqlContext.createDataFrame(Seq(
    --- End diff --
    
    I'm wondering if there is some way to reuse the code here since we need to implement roughly the same test (train with all different label column types, check that they are the same as training with double type) for each different predictor. Maybe adding something in a helper function under `MLTestingUtils`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165913205
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48024/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10355#issuecomment-165894584
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10355#discussion_r57047541
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
    @@ -27,4 +31,56 @@ object MLTestingUtils {
         assert(copied.parent.uid == model.parent.uid)
         assert(copied.parent == model.parent)
       }
    +
    +  def genClassifDFWithNumericLabelCol(
    +    sqlContext: SQLContext,
    +    labelColName: String,
    +    featuresColName: String
    +  ): Map[NumericType, DataFrame] = {
    --- End diff --
    
    nit: Spark style seems to be to put the return type on the line above, if it will fit. So:
    
    ```scala
    def genClassifDFWithNumericLabelCol(
        sqlContext: SQLContext,
        labelColName: String,
        featuresColName: String): Map[NumericType, DataFrame] = {
    ```
    [Example](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala#L168)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org