You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2016/02/22 14:51:33 UTC

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

GitHub user zhengruifeng opened a pull request:

    https://github.com/apache/spark/pull/11303

    [SPARK-13435] [MLlib] Add Weighted Cohen's kappa to MulticlassMetrics

    JIRA: https://issues.apache.org/jira/browse/SPARK-13435
    
    ## What changes were proposed in this pull request?
    
    Add the missing Weighted Cohen's kappa to MulticlassMetrics.
    
    
    ## How was the this patch tested?
    
    unit tests and manual tests were done. 
    The calculation correctness was tested on several small data, and compared to two python's package: sklearn and ml_metrics.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhengruifeng/spark kappa

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11303
    
----
commit fe9a88a5137c0476e9c07d7d7819a1a6df2587fa
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-02-22T09:00:33Z

    add kappa

commit 19a307ee367c4d574599370d6fd55f2a31c76bfe
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-02-22T09:12:30Z

    update kappa

commit 0b19e0bdb96f3002eb0d0f8ce8800470f2f9f382
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-02-22T09:27:22Z

    update kappa

commit 5bf863b07d32ccb86a4a6abd7be22b472a0ada53
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-02-22T10:57:44Z

    update kappa

commit 52f3b9bc14ac5904e638b22a45df722745d2afc8
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-02-22T11:24:30Z

    update kappa

commit 6c4f0341aaff8e32112b79a98007f7318916963b
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-02-22T13:25:57Z

    update kappa

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #11303: [SPARK-13435] [MLlib] Add Weighted Cohen's kappa ...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng closed the pull request at:

    https://github.com/apache/spark/pull/11303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53640285
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -129,86 +135,199 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
       }
     
       /**
    -   * Returns f1-measure for a given label (category)
    -   * @param label the label.
    -   */
    +    * Returns f1-measure for a given label (category)
    +    *
    +    * @param label the label.
    +    */
       @Since("1.1.0")
       def fMeasure(label: Double): Double = fMeasure(label, 1.0)
     
       /**
    -   * Returns precision
    -   */
    +    * Returns precision
    +    */
       @Since("1.1.0")
       lazy val precision: Double = tpByClass.values.sum.toDouble / labelCount
     
       /**
    -   * Returns recall
    -   * (equals to precision for multiclass classifier
    -   * because sum of all false positives is equal to sum
    -   * of all false negatives)
    -   */
    +    * Returns recall
    +    * (equals to precision for multiclass classifier
    +    * because sum of all false positives is equal to sum
    +    * of all false negatives)
    +    */
       @Since("1.1.0")
       lazy val recall: Double = precision
     
       /**
    -   * Returns f-measure
    -   * (equals to precision and recall because precision equals recall)
    -   */
    +    * Returns f-measure
    +    * (equals to precision and recall because precision equals recall)
    +    */
       @Since("1.1.0")
       lazy val fMeasure: Double = precision
     
       /**
    -   * Returns weighted true positive rate
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted true positive rate
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedTruePositiveRate: Double = weightedRecall
     
       /**
    -   * Returns weighted false positive rate
    -   */
    +    * Returns weighted false positive rate
    +    */
       @Since("1.1.0")
       lazy val weightedFalsePositiveRate: Double = labelCountByClass.map { case (category, count) =>
         falsePositiveRate(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged recall
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted averaged recall
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedRecall: Double = labelCountByClass.map { case (category, count) =>
         recall(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged precision
    -   */
    +    * Returns weighted averaged precision
    +    */
       @Since("1.1.0")
       lazy val weightedPrecision: Double = labelCountByClass.map { case (category, count) =>
         precision(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f-measure
    -   * @param beta the beta parameter.
    -   */
    +    * Returns weighted averaged f-measure
    +    *
    +    * @param beta the beta parameter.
    +    */
       @Since("1.1.0")
       def weightedFMeasure(beta: Double): Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, beta) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f1-measure
    -   */
    +    * Returns weighted averaged f1-measure
    +    */
       @Since("1.1.0")
       lazy val weightedFMeasure: Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, 1.0) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns the sequence of labels in ascending order
    -   */
    +    * Returns the sequence of labels in ascending order
    +    */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +
    +  /**
    +    * Returns unweighted Cohen's Kappa
    +    * Cohen's kappa coefficient is a statistic which measures inter-rater
    +    * agreement for qualitative (categorical) items. It is generally thought
    +    * to be a more robust measure than simple percent agreement calculation,
    +    * since kappa takes into account the agreement occurring by chance.
    +    * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +    * generally considered good agreement; zero or lower means no agreement
    +    * (practically random labels).
    +    */
    +  @Since("1.6.0")
    +  def kappa(): Double = {
    +    kappa("default")
    +  }
    +
    +  /**
    +    * Returns Cohen's Kappa with built-in weighted types
    +    *
    +    * @param weights the weighted type. "default" means no weighted;
    +    *                "linear" means linear weighted;
    +    *                "quadratic" means quadratic weighted.
    +    */
    +  @Since("1.6.0")
    +  def kappa(weights: String): Double = {
    +
    +    val func = weights match {
    +      case "default" =>
    +        (i: Int, j: Int) => {
    +          if (i == j) {
    +            0.0
    +          } else {
    +            1.0
    +          }
    +        }
    +      case "linear" =>
    +        (i: Int, j: Int) => Math.abs(i - j).toDouble
    +      case "quadratic" =>
    +        (i: Int, j: Int) => (i - j).toDouble * (i - j)
    +      case t =>
    +        throw new IllegalArgumentException(
    +          s"kappa only supports {linear, quadratic, default} but got type ${t}.")
    +    }
    +
    +    kappa(func)
    +  }
    +
    +
    +  /**
    +    * Returns Cohen's Kappa with user-defined weight matrix
    +    *
    +    * @param weights the weight matrix, must be of the same shape with Confusion Matrix.
    +    *                Note: Each Element in it must be no less than zero.
    +    */
    +  @Since("1.6.0")
    +  def kappa(weights: Matrix): Double = {
    --- End diff --
    
    I don't think this is that practical to expose; the matrix would in most cases need to be far too large. At least, this doesn't seem like the right thing to expose in a public API


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53727121
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -129,86 +135,199 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
       }
     
       /**
    -   * Returns f1-measure for a given label (category)
    -   * @param label the label.
    -   */
    +    * Returns f1-measure for a given label (category)
    +    *
    +    * @param label the label.
    +    */
       @Since("1.1.0")
       def fMeasure(label: Double): Double = fMeasure(label, 1.0)
     
       /**
    -   * Returns precision
    -   */
    +    * Returns precision
    +    */
       @Since("1.1.0")
       lazy val precision: Double = tpByClass.values.sum.toDouble / labelCount
     
       /**
    -   * Returns recall
    -   * (equals to precision for multiclass classifier
    -   * because sum of all false positives is equal to sum
    -   * of all false negatives)
    -   */
    +    * Returns recall
    +    * (equals to precision for multiclass classifier
    +    * because sum of all false positives is equal to sum
    +    * of all false negatives)
    +    */
       @Since("1.1.0")
       lazy val recall: Double = precision
     
       /**
    -   * Returns f-measure
    -   * (equals to precision and recall because precision equals recall)
    -   */
    +    * Returns f-measure
    +    * (equals to precision and recall because precision equals recall)
    +    */
       @Since("1.1.0")
       lazy val fMeasure: Double = precision
     
       /**
    -   * Returns weighted true positive rate
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted true positive rate
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedTruePositiveRate: Double = weightedRecall
     
       /**
    -   * Returns weighted false positive rate
    -   */
    +    * Returns weighted false positive rate
    +    */
       @Since("1.1.0")
       lazy val weightedFalsePositiveRate: Double = labelCountByClass.map { case (category, count) =>
         falsePositiveRate(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged recall
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted averaged recall
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedRecall: Double = labelCountByClass.map { case (category, count) =>
         recall(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged precision
    -   */
    +    * Returns weighted averaged precision
    +    */
       @Since("1.1.0")
       lazy val weightedPrecision: Double = labelCountByClass.map { case (category, count) =>
         precision(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f-measure
    -   * @param beta the beta parameter.
    -   */
    +    * Returns weighted averaged f-measure
    +    *
    +    * @param beta the beta parameter.
    +    */
       @Since("1.1.0")
       def weightedFMeasure(beta: Double): Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, beta) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f1-measure
    -   */
    +    * Returns weighted averaged f1-measure
    +    */
       @Since("1.1.0")
       lazy val weightedFMeasure: Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, 1.0) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns the sequence of labels in ascending order
    -   */
    +    * Returns the sequence of labels in ascending order
    +    */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +
    +  /**
    +    * Returns unweighted Cohen's Kappa
    +    * Cohen's kappa coefficient is a statistic which measures inter-rater
    +    * agreement for qualitative (categorical) items. It is generally thought
    +    * to be a more robust measure than simple percent agreement calculation,
    +    * since kappa takes into account the agreement occurring by chance.
    +    * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +    * generally considered good agreement; zero or lower means no agreement
    +    * (practically random labels).
    +    */
    +  @Since("1.6.0")
    +  def kappa(): Double = {
    +    kappa("default")
    +  }
    +
    +  /**
    +    * Returns Cohen's Kappa with built-in weighted types
    +    *
    +    * @param weights the weighted type. "default" means no weighted;
    +    *                "linear" means linear weighted;
    +    *                "quadratic" means quadratic weighted.
    +    */
    +  @Since("1.6.0")
    +  def kappa(weights: String): Double = {
    +
    +    val func = weights match {
    +      case "default" =>
    +        (i: Int, j: Int) => {
    +          if (i == j) {
    +            0.0
    +          } else {
    +            1.0
    +          }
    +        }
    +      case "linear" =>
    +        (i: Int, j: Int) => Math.abs(i - j).toDouble
    +      case "quadratic" =>
    +        (i: Int, j: Int) => (i - j).toDouble * (i - j)
    --- End diff --
    
    ok, I will fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-207897628
  
    cc @jkbradley @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-187677827
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51766/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-187687258
  
    @srowen yes, the style problem was caused by that the IJ's default 'reformating' style for scala dont match spark's.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53776227
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -211,4 +211,119 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
        */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +  /**
    +   * Returns unweighted Cohen's Kappa
    +   * Cohen's kappa coefficient is a statistic which measures inter-rater
    +   * agreement for qualitative (categorical) items. It is generally thought
    +   * to be a more robust measure than simple percent agreement calculation,
    +   * since kappa takes into account the agreement occurring by chance.
    +   * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +   * generally considered good agreement; zero or lower means no agreement
    +   * (practically random labels).
    +   */
    +  @Since("2.0.0")
    +  def kappa(): Double = {
    +    kappa("default")
    +  }
    +
    +  /**
    +   * Returns Cohen's Kappa with built-in weighted types
    +   * @param weights the weighted type. "default" means no weighted;
    +   *                "linear" means linear weighted;
    +   *                "quadratic" means quadratic weighted.
    +   */
    +  @Since("2.0.0")
    +  def kappa(weights: String): Double = {
    +
    +    val func = weights match {
    +      // standard kappa without weighting
    +      case "default" =>
    +        (i: Int, j: Int) => {
    +          if (i == j) {
    +            0.0
    +          } else {
    +            1.0
    +          }
    +        }
    +      // linear weighted kappa
    +      case "linear" =>
    +        (i: Int, j: Int) =>
    +          math.abs(i - j).toDouble
    +      // quadratic weighted kappa
    +      case "quadratic" =>
    +        (i: Int, j: Int) => {
    +          val d = i - j
    +          d.toDouble * d
    +        }
    +      // unknown weighting type
    +      case t =>
    +        throw new IllegalArgumentException(
    +          s"kappa only supports weighting type {linear, quadratic, default} but got type ${t}.")
    +    }
    +
    +    kappa(func)
    +  }
    +
    +
    +  /**
    +   * Returns Cohen's Kappa with user-defined weight matrix
    +   * @param weights the weight matrix, must be of the same shape with Confusion Matrix.
    +   *                Note: Each Element in it must be no less than zero.
    +   */
    +  @Since("2.0.0")
    +  def kappa(weights: Matrix): Double = {
    --- End diff --
    
    ok, this place needs more discussion.
    I still think it appropriate to view the weights as a matrix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-187677825
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-187677623
  
    **[Test build #51766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51766/consoleFull)** for PR 11303 at commit [`4272aca`](https://github.com/apache/spark/commit/4272aca6f68466bbad3a3b8d23e665b9144fb02a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   * (equals to precision for multiclass classifier`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53640075
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -129,86 +135,199 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
       }
     
       /**
    -   * Returns f1-measure for a given label (category)
    -   * @param label the label.
    -   */
    +    * Returns f1-measure for a given label (category)
    +    *
    +    * @param label the label.
    +    */
       @Since("1.1.0")
       def fMeasure(label: Double): Double = fMeasure(label, 1.0)
     
       /**
    -   * Returns precision
    -   */
    +    * Returns precision
    +    */
       @Since("1.1.0")
       lazy val precision: Double = tpByClass.values.sum.toDouble / labelCount
     
       /**
    -   * Returns recall
    -   * (equals to precision for multiclass classifier
    -   * because sum of all false positives is equal to sum
    -   * of all false negatives)
    -   */
    +    * Returns recall
    +    * (equals to precision for multiclass classifier
    +    * because sum of all false positives is equal to sum
    +    * of all false negatives)
    +    */
       @Since("1.1.0")
       lazy val recall: Double = precision
     
       /**
    -   * Returns f-measure
    -   * (equals to precision and recall because precision equals recall)
    -   */
    +    * Returns f-measure
    +    * (equals to precision and recall because precision equals recall)
    +    */
       @Since("1.1.0")
       lazy val fMeasure: Double = precision
     
       /**
    -   * Returns weighted true positive rate
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted true positive rate
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedTruePositiveRate: Double = weightedRecall
     
       /**
    -   * Returns weighted false positive rate
    -   */
    +    * Returns weighted false positive rate
    +    */
       @Since("1.1.0")
       lazy val weightedFalsePositiveRate: Double = labelCountByClass.map { case (category, count) =>
         falsePositiveRate(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged recall
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted averaged recall
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedRecall: Double = labelCountByClass.map { case (category, count) =>
         recall(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged precision
    -   */
    +    * Returns weighted averaged precision
    +    */
       @Since("1.1.0")
       lazy val weightedPrecision: Double = labelCountByClass.map { case (category, count) =>
         precision(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f-measure
    -   * @param beta the beta parameter.
    -   */
    +    * Returns weighted averaged f-measure
    +    *
    +    * @param beta the beta parameter.
    +    */
       @Since("1.1.0")
       def weightedFMeasure(beta: Double): Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, beta) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f1-measure
    -   */
    +    * Returns weighted averaged f1-measure
    +    */
       @Since("1.1.0")
       lazy val weightedFMeasure: Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, 1.0) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns the sequence of labels in ascending order
    -   */
    +    * Returns the sequence of labels in ascending order
    +    */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +
    +  /**
    +    * Returns unweighted Cohen's Kappa
    +    * Cohen's kappa coefficient is a statistic which measures inter-rater
    +    * agreement for qualitative (categorical) items. It is generally thought
    +    * to be a more robust measure than simple percent agreement calculation,
    +    * since kappa takes into account the agreement occurring by chance.
    +    * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +    * generally considered good agreement; zero or lower means no agreement
    +    * (practically random labels).
    +    */
    +  @Since("1.6.0")
    +  def kappa(): Double = {
    +    kappa("default")
    +  }
    +
    +  /**
    +    * Returns Cohen's Kappa with built-in weighted types
    +    *
    +    * @param weights the weighted type. "default" means no weighted;
    +    *                "linear" means linear weighted;
    +    *                "quadratic" means quadratic weighted.
    +    */
    +  @Since("1.6.0")
    +  def kappa(weights: String): Double = {
    +
    +    val func = weights match {
    +      case "default" =>
    +        (i: Int, j: Int) => {
    +          if (i == j) {
    +            0.0
    +          } else {
    +            1.0
    +          }
    +        }
    +      case "linear" =>
    +        (i: Int, j: Int) => Math.abs(i - j).toDouble
    --- End diff --
    
    Nit: `math.abs`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-187395032
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53767196
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -211,4 +211,119 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
        */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +  /**
    +   * Returns unweighted Cohen's Kappa
    +   * Cohen's kappa coefficient is a statistic which measures inter-rater
    +   * agreement for qualitative (categorical) items. It is generally thought
    +   * to be a more robust measure than simple percent agreement calculation,
    +   * since kappa takes into account the agreement occurring by chance.
    +   * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +   * generally considered good agreement; zero or lower means no agreement
    +   * (practically random labels).
    +   */
    +  @Since("2.0.0")
    +  def kappa(): Double = {
    +    kappa("default")
    +  }
    +
    +  /**
    +   * Returns Cohen's Kappa with built-in weighted types
    +   * @param weights the weighted type. "default" means no weighted;
    +   *                "linear" means linear weighted;
    +   *                "quadratic" means quadratic weighted.
    +   */
    +  @Since("2.0.0")
    +  def kappa(weights: String): Double = {
    +
    +    val func = weights match {
    +      // standard kappa without weighting
    +      case "default" =>
    +        (i: Int, j: Int) => {
    +          if (i == j) {
    +            0.0
    +          } else {
    +            1.0
    +          }
    +        }
    +      // linear weighted kappa
    +      case "linear" =>
    +        (i: Int, j: Int) =>
    +          math.abs(i - j).toDouble
    +      // quadratic weighted kappa
    +      case "quadratic" =>
    +        (i: Int, j: Int) => {
    +          val d = i - j
    +          d.toDouble * d
    +        }
    +      // unknown weighting type
    +      case t =>
    +        throw new IllegalArgumentException(
    +          s"kappa only supports weighting type {linear, quadratic, default} but got type ${t}.")
    +    }
    +
    +    kappa(func)
    +  }
    +
    +
    +  /**
    +   * Returns Cohen's Kappa with user-defined weight matrix
    +   * @param weights the weight matrix, must be of the same shape with Confusion Matrix.
    +   *                Note: Each Element in it must be no less than zero.
    +   */
    +  @Since("2.0.0")
    +  def kappa(weights: Matrix): Double = {
    --- End diff --
    
    I still don't think we should expose this. I can be private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53740489
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -129,86 +135,199 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
       }
     
       /**
    -   * Returns f1-measure for a given label (category)
    -   * @param label the label.
    -   */
    +    * Returns f1-measure for a given label (category)
    +    *
    +    * @param label the label.
    +    */
       @Since("1.1.0")
       def fMeasure(label: Double): Double = fMeasure(label, 1.0)
     
       /**
    -   * Returns precision
    -   */
    +    * Returns precision
    +    */
       @Since("1.1.0")
       lazy val precision: Double = tpByClass.values.sum.toDouble / labelCount
     
       /**
    -   * Returns recall
    -   * (equals to precision for multiclass classifier
    -   * because sum of all false positives is equal to sum
    -   * of all false negatives)
    -   */
    +    * Returns recall
    +    * (equals to precision for multiclass classifier
    +    * because sum of all false positives is equal to sum
    +    * of all false negatives)
    +    */
       @Since("1.1.0")
       lazy val recall: Double = precision
     
       /**
    -   * Returns f-measure
    -   * (equals to precision and recall because precision equals recall)
    -   */
    +    * Returns f-measure
    +    * (equals to precision and recall because precision equals recall)
    +    */
       @Since("1.1.0")
       lazy val fMeasure: Double = precision
     
       /**
    -   * Returns weighted true positive rate
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted true positive rate
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedTruePositiveRate: Double = weightedRecall
     
       /**
    -   * Returns weighted false positive rate
    -   */
    +    * Returns weighted false positive rate
    +    */
       @Since("1.1.0")
       lazy val weightedFalsePositiveRate: Double = labelCountByClass.map { case (category, count) =>
         falsePositiveRate(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged recall
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted averaged recall
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedRecall: Double = labelCountByClass.map { case (category, count) =>
         recall(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged precision
    -   */
    +    * Returns weighted averaged precision
    +    */
       @Since("1.1.0")
       lazy val weightedPrecision: Double = labelCountByClass.map { case (category, count) =>
         precision(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f-measure
    -   * @param beta the beta parameter.
    -   */
    +    * Returns weighted averaged f-measure
    +    *
    +    * @param beta the beta parameter.
    +    */
       @Since("1.1.0")
       def weightedFMeasure(beta: Double): Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, beta) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f1-measure
    -   */
    +    * Returns weighted averaged f1-measure
    +    */
       @Since("1.1.0")
       lazy val weightedFMeasure: Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, 1.0) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns the sequence of labels in ascending order
    -   */
    +    * Returns the sequence of labels in ascending order
    +    */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +
    +  /**
    +    * Returns unweighted Cohen's Kappa
    +    * Cohen's kappa coefficient is a statistic which measures inter-rater
    +    * agreement for qualitative (categorical) items. It is generally thought
    +    * to be a more robust measure than simple percent agreement calculation,
    +    * since kappa takes into account the agreement occurring by chance.
    +    * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +    * generally considered good agreement; zero or lower means no agreement
    +    * (practically random labels).
    +    */
    +  @Since("1.6.0")
    +  def kappa(): Double = {
    +    kappa("default")
    +  }
    +
    +  /**
    +    * Returns Cohen's Kappa with built-in weighted types
    +    *
    +    * @param weights the weighted type. "default" means no weighted;
    +    *                "linear" means linear weighted;
    +    *                "quadratic" means quadratic weighted.
    +    */
    +  @Since("1.6.0")
    +  def kappa(weights: String): Double = {
    +
    +    val func = weights match {
    +      case "default" =>
    +        (i: Int, j: Int) => {
    +          if (i == j) {
    +            0.0
    +          } else {
    +            1.0
    +          }
    +        }
    +      case "linear" =>
    +        (i: Int, j: Int) => Math.abs(i - j).toDouble
    --- End diff --
    
    ok, I have fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53726515
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -25,18 +25,19 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.DataFrame
     
     /**
    - * ::Experimental::
    - * Evaluator for multiclass classification.
    - *
    - * @param predictionAndLabels an RDD of (prediction, label) pairs.
    - */
    +  * ::Experimental::
    --- End diff --
    
    Sorry, this is caused by my IDE's reformating. I will undo those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53740507
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -129,86 +135,199 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
       }
     
       /**
    -   * Returns f1-measure for a given label (category)
    -   * @param label the label.
    -   */
    +    * Returns f1-measure for a given label (category)
    +    *
    +    * @param label the label.
    +    */
       @Since("1.1.0")
       def fMeasure(label: Double): Double = fMeasure(label, 1.0)
     
       /**
    -   * Returns precision
    -   */
    +    * Returns precision
    +    */
       @Since("1.1.0")
       lazy val precision: Double = tpByClass.values.sum.toDouble / labelCount
     
       /**
    -   * Returns recall
    -   * (equals to precision for multiclass classifier
    -   * because sum of all false positives is equal to sum
    -   * of all false negatives)
    -   */
    +    * Returns recall
    +    * (equals to precision for multiclass classifier
    +    * because sum of all false positives is equal to sum
    +    * of all false negatives)
    +    */
       @Since("1.1.0")
       lazy val recall: Double = precision
     
       /**
    -   * Returns f-measure
    -   * (equals to precision and recall because precision equals recall)
    -   */
    +    * Returns f-measure
    +    * (equals to precision and recall because precision equals recall)
    +    */
       @Since("1.1.0")
       lazy val fMeasure: Double = precision
     
       /**
    -   * Returns weighted true positive rate
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted true positive rate
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedTruePositiveRate: Double = weightedRecall
     
       /**
    -   * Returns weighted false positive rate
    -   */
    +    * Returns weighted false positive rate
    +    */
       @Since("1.1.0")
       lazy val weightedFalsePositiveRate: Double = labelCountByClass.map { case (category, count) =>
         falsePositiveRate(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged recall
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted averaged recall
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedRecall: Double = labelCountByClass.map { case (category, count) =>
         recall(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged precision
    -   */
    +    * Returns weighted averaged precision
    +    */
       @Since("1.1.0")
       lazy val weightedPrecision: Double = labelCountByClass.map { case (category, count) =>
         precision(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f-measure
    -   * @param beta the beta parameter.
    -   */
    +    * Returns weighted averaged f-measure
    +    *
    +    * @param beta the beta parameter.
    +    */
       @Since("1.1.0")
       def weightedFMeasure(beta: Double): Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, beta) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f1-measure
    -   */
    +    * Returns weighted averaged f1-measure
    +    */
       @Since("1.1.0")
       lazy val weightedFMeasure: Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, 1.0) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns the sequence of labels in ascending order
    -   */
    +    * Returns the sequence of labels in ascending order
    +    */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +
    +  /**
    +    * Returns unweighted Cohen's Kappa
    +    * Cohen's kappa coefficient is a statistic which measures inter-rater
    +    * agreement for qualitative (categorical) items. It is generally thought
    +    * to be a more robust measure than simple percent agreement calculation,
    +    * since kappa takes into account the agreement occurring by chance.
    +    * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +    * generally considered good agreement; zero or lower means no agreement
    +    * (practically random labels).
    +    */
    +  @Since("1.6.0")
    --- End diff --
    
    ok, I have fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53639837
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -25,18 +25,19 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.DataFrame
     
     /**
    - * ::Experimental::
    - * Evaluator for multiclass classification.
    - *
    - * @param predictionAndLabels an RDD of (prediction, label) pairs.
    - */
    +  * ::Experimental::
    --- End diff --
    
    Nit: undo these space changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53736887
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala ---
    @@ -51,6 +51,9 @@ class MulticlassMetricsSuite extends SparkFunSuite with MLlibTestSparkContext {
         val f2measure0 = (1 + 2 * 2) * precision0 * recall0 / (2 * 2 * precision0 + recall0)
         val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * precision1 + recall1)
         val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * precision2 + recall2)
    +    val unweighted_kappa = 0.47058823529411764
    --- End diff --
    
    The three number are obtained via sklearn and ml_metrics.
    
    In [4]: from sklearn.metrics import cohen_kappa_score
    
    In [5]: from ml_metrics import quadratic_weighted_kappa, linear_weighted_kappa, kappa
    
    In [6]: preds = [0, 0, 0, 1, 1, 1, 1, 2, 2]
    
    In [7]: labels = [0, 1, 0, 0, 1, 1, 1, 2, 0]
    
    In [8]: cohen_kappa_score(preds, labels)
    Out[8]: 0.47058823529411781
    
    In [9]: quadratic_weighted_kappa(preds, labels)
    Out[9]: 0.3571428571428571
    
    In [10]: linear_weighted_kappa(preds, labels)
    Out[10]: 0.4193548387096774
    
    In [11]: kappa(preds, labels)
    Out[11]: 0.47058823529411764


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53640132
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -129,86 +135,199 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
       }
     
       /**
    -   * Returns f1-measure for a given label (category)
    -   * @param label the label.
    -   */
    +    * Returns f1-measure for a given label (category)
    +    *
    +    * @param label the label.
    +    */
       @Since("1.1.0")
       def fMeasure(label: Double): Double = fMeasure(label, 1.0)
     
       /**
    -   * Returns precision
    -   */
    +    * Returns precision
    +    */
       @Since("1.1.0")
       lazy val precision: Double = tpByClass.values.sum.toDouble / labelCount
     
       /**
    -   * Returns recall
    -   * (equals to precision for multiclass classifier
    -   * because sum of all false positives is equal to sum
    -   * of all false negatives)
    -   */
    +    * Returns recall
    +    * (equals to precision for multiclass classifier
    +    * because sum of all false positives is equal to sum
    +    * of all false negatives)
    +    */
       @Since("1.1.0")
       lazy val recall: Double = precision
     
       /**
    -   * Returns f-measure
    -   * (equals to precision and recall because precision equals recall)
    -   */
    +    * Returns f-measure
    +    * (equals to precision and recall because precision equals recall)
    +    */
       @Since("1.1.0")
       lazy val fMeasure: Double = precision
     
       /**
    -   * Returns weighted true positive rate
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted true positive rate
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedTruePositiveRate: Double = weightedRecall
     
       /**
    -   * Returns weighted false positive rate
    -   */
    +    * Returns weighted false positive rate
    +    */
       @Since("1.1.0")
       lazy val weightedFalsePositiveRate: Double = labelCountByClass.map { case (category, count) =>
         falsePositiveRate(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged recall
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted averaged recall
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedRecall: Double = labelCountByClass.map { case (category, count) =>
         recall(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged precision
    -   */
    +    * Returns weighted averaged precision
    +    */
       @Since("1.1.0")
       lazy val weightedPrecision: Double = labelCountByClass.map { case (category, count) =>
         precision(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f-measure
    -   * @param beta the beta parameter.
    -   */
    +    * Returns weighted averaged f-measure
    +    *
    +    * @param beta the beta parameter.
    +    */
       @Since("1.1.0")
       def weightedFMeasure(beta: Double): Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, beta) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f1-measure
    -   */
    +    * Returns weighted averaged f1-measure
    +    */
       @Since("1.1.0")
       lazy val weightedFMeasure: Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, 1.0) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns the sequence of labels in ascending order
    -   */
    +    * Returns the sequence of labels in ascending order
    +    */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +
    +  /**
    +    * Returns unweighted Cohen's Kappa
    +    * Cohen's kappa coefficient is a statistic which measures inter-rater
    +    * agreement for qualitative (categorical) items. It is generally thought
    +    * to be a more robust measure than simple percent agreement calculation,
    +    * since kappa takes into account the agreement occurring by chance.
    +    * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +    * generally considered good agreement; zero or lower means no agreement
    +    * (practically random labels).
    +    */
    +  @Since("1.6.0")
    +  def kappa(): Double = {
    +    kappa("default")
    +  }
    +
    +  /**
    +    * Returns Cohen's Kappa with built-in weighted types
    +    *
    +    * @param weights the weighted type. "default" means no weighted;
    +    *                "linear" means linear weighted;
    +    *                "quadratic" means quadratic weighted.
    +    */
    +  @Since("1.6.0")
    +  def kappa(weights: String): Double = {
    +
    +    val func = weights match {
    +      case "default" =>
    +        (i: Int, j: Int) => {
    +          if (i == j) {
    +            0.0
    +          } else {
    +            1.0
    +          }
    +        }
    +      case "linear" =>
    +        (i: Int, j: Int) => Math.abs(i - j).toDouble
    +      case "quadratic" =>
    +        (i: Int, j: Int) => (i - j).toDouble * (i - j)
    --- End diff --
    
    Also tiny nit, but this needlessly computes the difference twice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53639940
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -129,86 +135,199 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
       }
     
       /**
    -   * Returns f1-measure for a given label (category)
    -   * @param label the label.
    -   */
    +    * Returns f1-measure for a given label (category)
    +    *
    +    * @param label the label.
    +    */
       @Since("1.1.0")
       def fMeasure(label: Double): Double = fMeasure(label, 1.0)
     
       /**
    -   * Returns precision
    -   */
    +    * Returns precision
    +    */
       @Since("1.1.0")
       lazy val precision: Double = tpByClass.values.sum.toDouble / labelCount
     
       /**
    -   * Returns recall
    -   * (equals to precision for multiclass classifier
    -   * because sum of all false positives is equal to sum
    -   * of all false negatives)
    -   */
    +    * Returns recall
    +    * (equals to precision for multiclass classifier
    +    * because sum of all false positives is equal to sum
    +    * of all false negatives)
    +    */
       @Since("1.1.0")
       lazy val recall: Double = precision
     
       /**
    -   * Returns f-measure
    -   * (equals to precision and recall because precision equals recall)
    -   */
    +    * Returns f-measure
    +    * (equals to precision and recall because precision equals recall)
    +    */
       @Since("1.1.0")
       lazy val fMeasure: Double = precision
     
       /**
    -   * Returns weighted true positive rate
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted true positive rate
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedTruePositiveRate: Double = weightedRecall
     
       /**
    -   * Returns weighted false positive rate
    -   */
    +    * Returns weighted false positive rate
    +    */
       @Since("1.1.0")
       lazy val weightedFalsePositiveRate: Double = labelCountByClass.map { case (category, count) =>
         falsePositiveRate(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged recall
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted averaged recall
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedRecall: Double = labelCountByClass.map { case (category, count) =>
         recall(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged precision
    -   */
    +    * Returns weighted averaged precision
    +    */
       @Since("1.1.0")
       lazy val weightedPrecision: Double = labelCountByClass.map { case (category, count) =>
         precision(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f-measure
    -   * @param beta the beta parameter.
    -   */
    +    * Returns weighted averaged f-measure
    +    *
    +    * @param beta the beta parameter.
    +    */
       @Since("1.1.0")
       def weightedFMeasure(beta: Double): Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, beta) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f1-measure
    -   */
    +    * Returns weighted averaged f1-measure
    +    */
       @Since("1.1.0")
       lazy val weightedFMeasure: Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, 1.0) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns the sequence of labels in ascending order
    -   */
    +    * Returns the sequence of labels in ascending order
    +    */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +
    +  /**
    +    * Returns unweighted Cohen's Kappa
    +    * Cohen's kappa coefficient is a statistic which measures inter-rater
    +    * agreement for qualitative (categorical) items. It is generally thought
    +    * to be a more robust measure than simple percent agreement calculation,
    +    * since kappa takes into account the agreement occurring by chance.
    +    * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +    * generally considered good agreement; zero or lower means no agreement
    +    * (practically random labels).
    +    */
    +  @Since("1.6.0")
    --- End diff --
    
    This can't be right since 1.6.0 is out. 2.0.0.
    Your scaladoc indentation is off and you need to not automatically alter the rest of it. Most of your PR is whitespace change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53767167
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala ---
    @@ -51,6 +51,9 @@ class MulticlassMetricsSuite extends SparkFunSuite with MLlibTestSparkContext {
         val f2measure0 = (1 + 2 * 2) * precision0 * recall0 / (2 * 2 * precision0 + recall0)
         val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * precision1 + recall1)
         val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * precision2 + recall2)
    +    val unweighted_kappa = 0.47058823529411764
    --- End diff --
    
    Sounds good to me


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53736779
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -129,86 +135,199 @@ class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Doubl
       }
     
       /**
    -   * Returns f1-measure for a given label (category)
    -   * @param label the label.
    -   */
    +    * Returns f1-measure for a given label (category)
    +    *
    +    * @param label the label.
    +    */
       @Since("1.1.0")
       def fMeasure(label: Double): Double = fMeasure(label, 1.0)
     
       /**
    -   * Returns precision
    -   */
    +    * Returns precision
    +    */
       @Since("1.1.0")
       lazy val precision: Double = tpByClass.values.sum.toDouble / labelCount
     
       /**
    -   * Returns recall
    -   * (equals to precision for multiclass classifier
    -   * because sum of all false positives is equal to sum
    -   * of all false negatives)
    -   */
    +    * Returns recall
    +    * (equals to precision for multiclass classifier
    +    * because sum of all false positives is equal to sum
    +    * of all false negatives)
    +    */
       @Since("1.1.0")
       lazy val recall: Double = precision
     
       /**
    -   * Returns f-measure
    -   * (equals to precision and recall because precision equals recall)
    -   */
    +    * Returns f-measure
    +    * (equals to precision and recall because precision equals recall)
    +    */
       @Since("1.1.0")
       lazy val fMeasure: Double = precision
     
       /**
    -   * Returns weighted true positive rate
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted true positive rate
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedTruePositiveRate: Double = weightedRecall
     
       /**
    -   * Returns weighted false positive rate
    -   */
    +    * Returns weighted false positive rate
    +    */
       @Since("1.1.0")
       lazy val weightedFalsePositiveRate: Double = labelCountByClass.map { case (category, count) =>
         falsePositiveRate(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged recall
    -   * (equals to precision, recall and f-measure)
    -   */
    +    * Returns weighted averaged recall
    +    * (equals to precision, recall and f-measure)
    +    */
       @Since("1.1.0")
       lazy val weightedRecall: Double = labelCountByClass.map { case (category, count) =>
         recall(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged precision
    -   */
    +    * Returns weighted averaged precision
    +    */
       @Since("1.1.0")
       lazy val weightedPrecision: Double = labelCountByClass.map { case (category, count) =>
         precision(category) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f-measure
    -   * @param beta the beta parameter.
    -   */
    +    * Returns weighted averaged f-measure
    +    *
    +    * @param beta the beta parameter.
    +    */
       @Since("1.1.0")
       def weightedFMeasure(beta: Double): Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, beta) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns weighted averaged f1-measure
    -   */
    +    * Returns weighted averaged f1-measure
    +    */
       @Since("1.1.0")
       lazy val weightedFMeasure: Double = labelCountByClass.map { case (category, count) =>
         fMeasure(category, 1.0) * count.toDouble / labelCount
       }.sum
     
       /**
    -   * Returns the sequence of labels in ascending order
    -   */
    +    * Returns the sequence of labels in ascending order
    +    */
       @Since("1.1.0")
       lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted
    +
    +
    +  /**
    +    * Returns unweighted Cohen's Kappa
    +    * Cohen's kappa coefficient is a statistic which measures inter-rater
    +    * agreement for qualitative (categorical) items. It is generally thought
    +    * to be a more robust measure than simple percent agreement calculation,
    +    * since kappa takes into account the agreement occurring by chance.
    +    * The kappa score is a number between -1 and 1. Scores above 0.8 are
    +    * generally considered good agreement; zero or lower means no agreement
    +    * (practically random labels).
    +    */
    +  @Since("1.6.0")
    +  def kappa(): Double = {
    +    kappa("default")
    +  }
    +
    +  /**
    +    * Returns Cohen's Kappa with built-in weighted types
    +    *
    +    * @param weights the weighted type. "default" means no weighted;
    +    *                "linear" means linear weighted;
    +    *                "quadratic" means quadratic weighted.
    +    */
    +  @Since("1.6.0")
    +  def kappa(weights: String): Double = {
    +
    +    val func = weights match {
    +      case "default" =>
    +        (i: Int, j: Int) => {
    +          if (i == j) {
    +            0.0
    +          } else {
    +            1.0
    +          }
    +        }
    +      case "linear" =>
    +        (i: Int, j: Int) => Math.abs(i - j).toDouble
    +      case "quadratic" =>
    +        (i: Int, j: Int) => (i - j).toDouble * (i - j)
    +      case t =>
    +        throw new IllegalArgumentException(
    +          s"kappa only supports {linear, quadratic, default} but got type ${t}.")
    +    }
    +
    +    kappa(func)
    +  }
    +
    +
    +  /**
    +    * Returns Cohen's Kappa with user-defined weight matrix
    +    *
    +    * @param weights the weight matrix, must be of the same shape with Confusion Matrix.
    +    *                Note: Each Element in it must be no less than zero.
    +    */
    +  @Since("1.6.0")
    +  def kappa(weights: Matrix): Double = {
    --- End diff --
    
    I trend to keep this api, for the input matrix is small, whose size is nClass * nClass.
    And in the relative statistical documents (such as http://www.real-statistics.com/reliability/weighted-cohens-kappa/), there exists the weighted matrix. Keeping it may make kappa score more comprehensive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-187659571
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53640590
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala ---
    @@ -51,6 +51,9 @@ class MulticlassMetricsSuite extends SparkFunSuite with MLlibTestSparkContext {
         val f2measure0 = (1 + 2 * 2) * precision0 * recall0 / (2 * 2 * precision0 + recall0)
         val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * precision1 + recall1)
         val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * precision2 + recall2)
    +    val unweighted_kappa = 0.47058823529411764
    --- End diff --
    
    Out of curiosity did you obtain these from something like R? that's always great as a double check if possible


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11303#discussion_r53741198
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ---
    @@ -25,18 +25,19 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.DataFrame
     
     /**
    - * ::Experimental::
    - * Evaluator for multiclass classification.
    - *
    - * @param predictionAndLabels an RDD of (prediction, label) pairs.
    - */
    +  * ::Experimental::
    --- End diff --
    
    It take me half one day to refomart the code... 
    I use IntelliJ IDEA CE as main IDE, and where can I find the correct IDEA Code Style config for spark? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-187662058
  
    **[Test build #51766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51766/consoleFull)** for PR 11303 at commit [`4272aca`](https://github.com/apache/spark/commit/4272aca6f68466bbad3a3b8d23e665b9144fb02a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13435] [MLlib] Add Weighted Cohen's kap...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/11303#issuecomment-187659785
  
    Any thoughts @mengxr ?
    
    As for code formatting, I don't think IJ will change lines you don't touch. However you can edit the style settings in the editor preferences, and just make it write the scaladoc without the extra level of indent.
    
    I wouldn't worry too much about the style -- just follow what you see around your change and undo changes that aren't related


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org