You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BenFradet <gi...@git.apache.org> on 2015/12/25 00:31:01 UTC

[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

GitHub user BenFradet opened a pull request:

    https://github.com/apache/spark/pull/10472

    [SPARK-9716] [ML] BinaryClassificationEvaluator should accept Double prediction column

    This PR aims to allow the prediction column of `BinaryClassificationEvaluator` to be of double type.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BenFradet/spark SPARK-9716

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10472
    
----
commit 2c427752ae28e512d3f076a14f46527946f27012
Author: BenFradet <be...@gmail.com>
Date:   2015-12-23T22:38:12Z

    new checkColumnType method checking for a variety of types

commit df58edc76e4b4c84ace4a7b3962e3175c4a5bfab
Author: BenFradet <be...@gmail.com>
Date:   2015-12-23T22:38:39Z

    binary classification evaluator now accepts raw prediction col of type vector and double

commit 8e80dfade54c81e7d951643aa67cc376665d8339
Author: BenFradet <be...@gmail.com>
Date:   2015-12-24T22:09:27Z

    removed overload disambiguation

commit 80fde3f9b6e124c80a7d5a2ee2a0363fdc216448
Author: BenFradet <be...@gmail.com>
Date:   2015-12-24T23:23:07Z

    binary classification evaluator evaluate method now handles both vector and double types

commit 725f6a5931ac763e9ca1fa0d9a9c590d6887900d
Author: BenFradet <be...@gmail.com>
Date:   2015-12-24T23:23:18Z

    test suite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-170924106
  
    Ping @jkbradley, is that what you had in mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10472#discussion_r49679366
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala ---
    @@ -29,6 +29,7 @@ import org.apache.spark.sql.types.DoubleType
     /**
      * :: Experimental ::
      * Evaluator for binary classification, which expects two input columns: rawPrediction and label.
    + * The rawPrediction column can be of type double or vector.
    --- End diff --
    
    How about a little more explicit:
    ```The rawPrediction column can be of type double (binary 0/1 prediction, or probability of label 1) or of type vector (length-2 vector of raw predictions, scores, or label probabilities).```
    
    Also, could you please update the Python class doc with the same text?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10472#discussion_r49029260
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala ---
    @@ -44,6 +44,23 @@ private[spark] object SchemaUtils {
       }
     
       /**
    +    * Check whether the given schema contains a column of one of the require data types.
    +    * @param colName  column name
    +    * @param dataTypes  required column data types
    +    */
    +  def checkColumnTypes(
    +      schema: StructType,
    +      colName: String,
    +      dataTypes: Seq[DataType],
    +      msg: String = ""): Unit = {
    +    val actualDataType = schema(colName).dataType
    +    val message = if (msg != null && msg.trim.length > 0) " " + msg else ""
    +    require(dataTypes.exists(actualDataType.equals),
    +      s"Column $colName must be of type equals to one of the following types " +
    --- End diff --
    
    "equals" --> "equal"
    Put colon after "types"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10472#discussion_r49691297
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala ---
    @@ -29,6 +29,7 @@ import org.apache.spark.sql.types.DoubleType
     /**
      * :: Experimental ::
      * Evaluator for binary classification, which expects two input columns: rawPrediction and label.
    + * The rawPrediction column can be of type double or vector.
    --- End diff --
    
    Yup, will do later in the day, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-167169451
  
    **[Test build #48318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48318/consoleFull)** for PR 10472 at commit [`725f6a5`](https://github.com/apache/spark/commit/725f6a5931ac763e9ca1fa0d9a9c590d6887900d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-171779233
  
    **[Test build #49412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49412/consoleFull)** for PR 10472 at commit [`0b625cf`](https://github.com/apache/spark/commit/0b625cf02862d4e61d774f9312cf1a879a7cafdb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169837135
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48977/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by thunterdb <gi...@git.apache.org>.
Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10472#discussion_r48993376
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala ---
    @@ -44,6 +44,23 @@ private[spark] object SchemaUtils {
       }
     
       /**
    +    * Check whether the given schema contains a column of one of the require data types.
    +    * @param colName  column name
    +    * @param dataTypes  required column data types
    +    */
    +  def checkColumnTypes(
    +      schema: StructType,
    +      colName: String,
    +      dataTypes: Seq[DataType],
    +      msg: String = ""): Unit = {
    +    val actualDataType = schema(colName).dataType
    +    val message = if (msg != null && msg.trim.length > 0) " " + msg else ""
    +    require(dataTypes.exists(actualDataType.equals(_)),
    --- End diff --
    
    You do not need the `_`:  `dataTypes.exists(actualDataType.equals)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169573848
  
    **[Test build #48902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48902/consoleFull)** for PR 10472 at commit [`b481d67`](https://github.com/apache/spark/commit/b481d67b731e2e9f6cedd93d2873156f78e151aa).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-173015481
  
    Merging with master
    Thanks for the PR!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169564256
  
    @jkbradley Updated, thanks for your review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-171790005
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49412/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-172939069
  
    LGTM.  I'll re-run tests since it's been a little while though.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169573908
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48902/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-171789681
  
    **[Test build #49412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49412/consoleFull)** for PR 10472 at commit [`0b625cf`](https://github.com/apache/spark/commit/0b625cf02862d4e61d774f9312cf1a879a7cafdb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169827261
  
    **[Test build #48977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48977/consoleFull)** for PR 10472 at commit [`860861c`](https://github.com/apache/spark/commit/860861cb613a2d00a70e4eb699c25b2375c86eda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169788073
  
    **[Test build #48955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48955/consoleFull)** for PR 10472 at commit [`2cbc667`](https://github.com/apache/spark/commit/2cbc667e71c8397d78ff8df2c8f30b4c8a52449d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-167169466
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169464880
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169573906
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169464891
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48870/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169453187
  
    **[Test build #48870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48870/consoleFull)** for PR 10472 at commit [`0233c07`](https://github.com/apache/spark/commit/0233c074187059db7abb51d231063f99a86c60ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-172941376
  
    No problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169818842
  
    Thanks for the updates!  I just thought of one more item: Could you please update the class documentation (BinaryClassificationEvaluator.scala around line 31) to state the options for the "rawPredictionCol"?  That should be it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169442981
  
    @thunterdb thanks for the review, will fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169836993
  
    **[Test build #48977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48977/consoleFull)** for PR 10472 at commit [`860861c`](https://github.com/apache/spark/commit/860861cb613a2d00a70e4eb699c25b2375c86eda).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169464364
  
    **[Test build #48870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48870/consoleFull)** for PR 10472 at commit [`0233c07`](https://github.com/apache/spark/commit/0233c074187059db7abb51d231063f99a86c60ce).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-172986605
  
    **[Test build #49707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49707/consoleFull)** for PR 10472 at commit [`0b625cf`](https://github.com/apache/spark/commit/0b625cf02862d4e61d774f9312cf1a879a7cafdb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-172999916
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49707/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by thunterdb <gi...@git.apache.org>.
Github user thunterdb commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169483754
  
    cc @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-172999915
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169508852
  
    Just a couple small comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by thunterdb <gi...@git.apache.org>.
Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10472#discussion_r48993085
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala ---
    @@ -79,13 +79,14 @@ class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override va
       @Since("1.2.0")
       override def evaluate(dataset: DataFrame): Double = {
         val schema = dataset.schema
    -    SchemaUtils.checkColumnType(schema, $(rawPredictionCol), new VectorUDT)
    +    SchemaUtils.checkColumnTypes(schema, $(rawPredictionCol), Seq(DoubleType, new VectorUDT))
         SchemaUtils.checkColumnType(schema, $(labelCol), DoubleType)
     
         // TODO: When dataset metadata has been implemented, check rawPredictionCol vector length = 2.
         val scoreAndLabels = dataset.select($(rawPredictionCol), $(labelCol))
    -      .map { case Row(rawPrediction: Vector, label: Double) =>
    -        (rawPrediction(1), label)
    +      .map {
    +        case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1), label)
    +        case Row(rawPrediction: Double, label: Double) => (rawPrediction, label)
    --- End diff --
    
    there is a small loss of performance because a conditional branch is introduced here, but I believe the cost of unpacking the row is much higher anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10472#discussion_r49029265
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluatorSuite.scala ---
    @@ -36,4 +37,35 @@ class BinaryClassificationEvaluatorSuite
           .setMetricName("areaUnderPR")
         testDefaultReadWrite(evaluator)
       }
    +
    +  test("should accept both vector and double raw prediction col") {
    +    val evaluator = new BinaryClassificationEvaluator()
    +      .setMetricName("areaUnderPR")
    +
    +    val vectorDF = sqlContext.createDataFrame(Seq(
    +      (0d, Vectors.dense(12, 2.5)),
    +      (1d, Vectors.dense(1, 3)),
    +      (0d, Vectors.dense(10, 2))
    +    )).toDF("label", "rawPrediction")
    +    assert(evaluator.evaluate(vectorDF) === 1.0)
    +
    +    val doubleDF = sqlContext.createDataFrame(Seq(
    +      (0d, 0d),
    +      (1d, 1d),
    +      (0d, 0d)
    +    )).toDF("label", "rawPrediction")
    +    assert(evaluator.evaluate(doubleDF) === 1.0)
    +
    +    val stringDF = sqlContext.createDataFrame(Seq(
    +      (0d, "0.0d"),
    +      (1d, "1.0d"),
    +      (0d, "0.0d")
    +    )).toDF("label", "rawPrediction")
    +    val thrown = intercept[IllegalArgumentException] {
    +      evaluator.evaluate(stringDF)
    +    }
    +    assert(thrown.getMessage contains "Column rawPrediction must be of type equals to one of the " +
    --- End diff --
    
    How about something like this to be more robust to formatting changes?
    ```
    assert(thrown.getMessage.replace("\n", "") contains ...)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by thunterdb <gi...@git.apache.org>.
Github user thunterdb commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169418377
  
    @BenFradet thanks! Just a small comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-172999713
  
    **[Test build #49707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49707/consoleFull)** for PR 10472 at commit [`0b625cf`](https://github.com/apache/spark/commit/0b625cf02862d4e61d774f9312cf1a879a7cafdb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-172986883
  
    **[Test build #2411 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2411/consoleFull)** for PR 10472 at commit [`0b625cf`](https://github.com/apache/spark/commit/0b625cf02862d4e61d774f9312cf1a879a7cafdb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-167169467
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48318/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169788272
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48955/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-167166797
  
    **[Test build #48318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48318/consoleFull)** for PR 10472 at commit [`725f6a5`](https://github.com/apache/spark/commit/725f6a5931ac763e9ca1fa0d9a9c590d6887900d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169581093
  
    Forgot to change the test suite, will fix later today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-172982448
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10472


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-173000926
  
    **[Test build #2411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2411/consoleFull)** for PR 10472 at commit [`0b625cf`](https://github.com/apache/spark/commit/0b625cf02862d4e61d774f9312cf1a879a7cafdb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169565680
  
    **[Test build #48902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48902/consoleFull)** for PR 10472 at commit [`b481d67`](https://github.com/apache/spark/commit/b481d67b731e2e9f6cedd93d2873156f78e151aa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169837133
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-169775897
  
    **[Test build #48955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48955/consoleFull)** for PR 10472 at commit [`2cbc667`](https://github.com/apache/spark/commit/2cbc667e71c8397d78ff8df2c8f30b4c8a52449d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-9716] [ML] BinaryClassificationEvaluato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10472#issuecomment-171790000
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org