You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Maciej Szymkiewicz <ms...@gmail.com> on 2016/12/05 23:30:02 UTC

[MLLIB] RankingMetrics.precisionAt

Hi,

Could I ask fora fresh pair of eyes on this piece of code:

https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80

  @Since("1.2.0")
  def precisionAt(k: Int): Double = {
    require(k > 0, "ranking position k should be positive")
    predictionAndLabels.map { case (pred, lab) =>
      val labSet = lab.toSet

      if (labSet.nonEmpty) {
        val n = math.min(pred.length, k)
        var i = 0
        var cnt = 0
        while (i < n) {
          if (labSet.contains(pred(i))) {
            cnt += 1
          }
          i += 1
        }
        cnt.toDouble / k
      } else {
        logWarning("Empty ground truth set, check input data")
        0.0
      }
    }.mean()
  }


Am I the only one who thinks this doesn't do what it claims? Just for
reference:

  * https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
  * https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py

-- 
Best,
Maciej


Re: [MLLIB] RankingMetrics.precisionAt

Posted by Maciej Szymkiewicz <ms...@gmail.com>.
This sounds much better.

Follow up question is if we should provide MAP@k, which I believe is
wider used metric.


On 12/06/2016 09:52 PM, Sean Owen wrote:
> As I understand, this might best be called "mean precision@k", not
> "mean average precision, up to k".
>
> On Tue, Dec 6, 2016 at 9:43 PM Maciej Szymkiewicz
> <mszymkiewicz@gmail.com <ma...@gmail.com>> wrote:
>
>     Thank you Sean.
>
>     Maybe I am just confused about the language. When I read that it
>     returns "the average precision at the first k ranking positions" I
>     somehow expect there will ap@k there and a the final output would
>     be MAP@k not average precision at the k-th position.
>
>     I guess it is not enough sleep.
>
>     On 12/06/2016 02:45 AM, Sean Owen wrote:
>>     I read it again and that looks like it implements mean
>>     precision@k as I would expect. What is the issue?
>>
>>     On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz
>>     <mszymkiewicz@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Hi,
>>
>>         Could I ask fora fresh pair of eyes on this piece of code:
>>
>>         https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>>
>>           @Since("1.2.0")
>>           def precisionAt(k: Int): Double = {
>>             require(k > 0, "ranking position k should be positive")
>>             predictionAndLabels.map { case (pred, lab) =>
>>               val labSet = lab.toSet
>>
>>               if (labSet.nonEmpty) {
>>                 val n = math.min(pred.length, k)
>>                 var i = 0
>>                 var cnt = 0
>>                 while (i < n) {
>>                   if (labSet.contains(pred(i))) {
>>                     cnt += 1
>>                   }
>>                   i += 1
>>                 }
>>                 cnt.toDouble / k
>>               } else {
>>                 logWarning("Empty ground truth set, check input data")
>>                 0.0
>>               }
>>             }.mean()
>>           }
>>
>>
>>         Am I the only one who thinks this doesn't do what it claims?
>>         Just for reference:
>>
>>           * https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>>           * https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>>
>>         -- 
>>         Best,
>>         Maciej
>>
>
>     -- 
>     Maciej Szymkiewicz
>

-- 
Maciej Szymkiewicz


Re: [MLLIB] RankingMetrics.precisionAt

Posted by Sean Owen <so...@cloudera.com>.
As I understand, this might best be called "mean precision@k", not "mean
average precision, up to k".

On Tue, Dec 6, 2016 at 9:43 PM Maciej Szymkiewicz <ms...@gmail.com>
wrote:

> Thank you Sean.
>
> Maybe I am just confused about the language. When I read that it returns "the
> average precision at the first k ranking positions" I somehow expect there
> will ap@k there and a the final output would be MAP@k not average
> precision at the k-th position.
>
> I guess it is not enough sleep.
> On 12/06/2016 02:45 AM, Sean Owen wrote:
>
> I read it again and that looks like it implements mean precision@k as I
> would expect. What is the issue?
>
> On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz <ms...@gmail.com>
> wrote:
>
> Hi,
>
> Could I ask for a fresh pair of eyes on this piece of code:
>
>
> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
>   @Since("1.2.0")
>   def precisionAt(k: Int): Double = {
>     require(k > 0, "ranking position k should be positive")
>     predictionAndLabels.map { case (pred, lab) =>
>       val labSet = lab.toSet
>
>       if (labSet.nonEmpty) {
>         val n = math.min(pred.length, k)
>         var i = 0
>         var cnt = 0
>         while (i < n) {
>           if (labSet.contains(pred(i))) {
>             cnt += 1
>           }
>           i += 1
>         }
>         cnt.toDouble / k
>       } else {
>         logWarning("Empty ground truth set, check input data")
>         0.0
>       }
>     }.mean()
>   }
>
>
> Am I the only one who thinks this doesn't do what it claims? Just for
> reference:
>
>
>    -
>    https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>    -
>    https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
> --
> Best,
> Maciej
>
>
> --
> Maciej Szymkiewicz
>
>

Re: [MLLIB] RankingMetrics.precisionAt

Posted by Maciej Szymkiewicz <ms...@gmail.com>.
Thank you Sean.

Maybe I am just confused about the language. When I read that it returns
"the average precision at the first k ranking positions" I somehow
expect there will ap@k there and a the final output would be MAP@k not
average precision at the k-th position.

I guess it is not enough sleep.

On 12/06/2016 02:45 AM, Sean Owen wrote:
> I read it again and that looks like it implements mean precision@k as
> I would expect. What is the issue?
>
> On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz <mszymkiewicz@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Hi,
>
>     Could I ask fora fresh pair of eyes on this piece of code:
>
>     https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
>       @Since("1.2.0")
>       def precisionAt(k: Int): Double = {
>         require(k > 0, "ranking position k should be positive")
>         predictionAndLabels.map { case (pred, lab) =>
>           val labSet = lab.toSet
>
>           if (labSet.nonEmpty) {
>             val n = math.min(pred.length, k)
>             var i = 0
>             var cnt = 0
>             while (i < n) {
>               if (labSet.contains(pred(i))) {
>                 cnt += 1
>               }
>               i += 1
>             }
>             cnt.toDouble / k
>           } else {
>             logWarning("Empty ground truth set, check input data")
>             0.0
>           }
>         }.mean()
>       }
>
>
>     Am I the only one who thinks this doesn't do what it claims? Just
>     for reference:
>
>       * https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>       * https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
>     -- 
>     Best,
>     Maciej
>

-- 
Maciej Szymkiewicz


Re: [MLLIB] RankingMetrics.precisionAt

Posted by Sean Owen <so...@cloudera.com>.
I read it again and that looks like it implements mean precision@k as I
would expect. What is the issue?

On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz <ms...@gmail.com>
wrote:

> Hi,
>
> Could I ask for a fresh pair of eyes on this piece of code:
>
>
> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
>   @Since("1.2.0")
>   def precisionAt(k: Int): Double = {
>     require(k > 0, "ranking position k should be positive")
>     predictionAndLabels.map { case (pred, lab) =>
>       val labSet = lab.toSet
>
>       if (labSet.nonEmpty) {
>         val n = math.min(pred.length, k)
>         var i = 0
>         var cnt = 0
>         while (i < n) {
>           if (labSet.contains(pred(i))) {
>             cnt += 1
>           }
>           i += 1
>         }
>         cnt.toDouble / k
>       } else {
>         logWarning("Empty ground truth set, check input data")
>         0.0
>       }
>     }.mean()
>   }
>
>
> Am I the only one who thinks this doesn't do what it claims? Just for
> reference:
>
>
>    -
>    https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>    -
>    https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
> --
> Best,
> Maciej
>
>