You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Maciej Szymkiewicz <ms...@gmail.com> on 2016/12/05 23:30:02 UTC
[MLLIB] RankingMetrics.precisionAt
Hi,
Could I ask fora fresh pair of eyes on this piece of code:
https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
@Since("1.2.0")
def precisionAt(k: Int): Double = {
require(k > 0, "ranking position k should be positive")
predictionAndLabels.map { case (pred, lab) =>
val labSet = lab.toSet
if (labSet.nonEmpty) {
val n = math.min(pred.length, k)
var i = 0
var cnt = 0
while (i < n) {
if (labSet.contains(pred(i))) {
cnt += 1
}
i += 1
}
cnt.toDouble / k
} else {
logWarning("Empty ground truth set, check input data")
0.0
}
}.mean()
}
Am I the only one who thinks this doesn't do what it claims? Just for
reference:
* https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
* https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
--
Best,
Maciej
Re: [MLLIB] RankingMetrics.precisionAt
Posted by Maciej Szymkiewicz <ms...@gmail.com>.
This sounds much better.
Follow up question is if we should provide MAP@k, which I believe is
wider used metric.
On 12/06/2016 09:52 PM, Sean Owen wrote:
> As I understand, this might best be called "mean precision@k", not
> "mean average precision, up to k".
>
> On Tue, Dec 6, 2016 at 9:43 PM Maciej Szymkiewicz
> <mszymkiewicz@gmail.com <ma...@gmail.com>> wrote:
>
> Thank you Sean.
>
> Maybe I am just confused about the language. When I read that it
> returns "the average precision at the first k ranking positions" I
> somehow expect there will ap@k there and a the final output would
> be MAP@k not average precision at the k-th position.
>
> I guess it is not enough sleep.
>
> On 12/06/2016 02:45 AM, Sean Owen wrote:
>> I read it again and that looks like it implements mean
>> precision@k as I would expect. What is the issue?
>>
>> On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz
>> <mszymkiewicz@gmail.com <ma...@gmail.com>> wrote:
>>
>> Hi,
>>
>> Could I ask fora fresh pair of eyes on this piece of code:
>>
>> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>>
>> @Since("1.2.0")
>> def precisionAt(k: Int): Double = {
>> require(k > 0, "ranking position k should be positive")
>> predictionAndLabels.map { case (pred, lab) =>
>> val labSet = lab.toSet
>>
>> if (labSet.nonEmpty) {
>> val n = math.min(pred.length, k)
>> var i = 0
>> var cnt = 0
>> while (i < n) {
>> if (labSet.contains(pred(i))) {
>> cnt += 1
>> }
>> i += 1
>> }
>> cnt.toDouble / k
>> } else {
>> logWarning("Empty ground truth set, check input data")
>> 0.0
>> }
>> }.mean()
>> }
>>
>>
>> Am I the only one who thinks this doesn't do what it claims?
>> Just for reference:
>>
>> * https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>> * https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>>
>> --
>> Best,
>> Maciej
>>
>
> --
> Maciej Szymkiewicz
>
--
Maciej Szymkiewicz
Re: [MLLIB] RankingMetrics.precisionAt
Posted by Sean Owen <so...@cloudera.com>.
As I understand, this might best be called "mean precision@k", not "mean
average precision, up to k".
On Tue, Dec 6, 2016 at 9:43 PM Maciej Szymkiewicz <ms...@gmail.com>
wrote:
> Thank you Sean.
>
> Maybe I am just confused about the language. When I read that it returns "the
> average precision at the first k ranking positions" I somehow expect there
> will ap@k there and a the final output would be MAP@k not average
> precision at the k-th position.
>
> I guess it is not enough sleep.
> On 12/06/2016 02:45 AM, Sean Owen wrote:
>
> I read it again and that looks like it implements mean precision@k as I
> would expect. What is the issue?
>
> On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz <ms...@gmail.com>
> wrote:
>
> Hi,
>
> Could I ask for a fresh pair of eyes on this piece of code:
>
>
> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
> @Since("1.2.0")
> def precisionAt(k: Int): Double = {
> require(k > 0, "ranking position k should be positive")
> predictionAndLabels.map { case (pred, lab) =>
> val labSet = lab.toSet
>
> if (labSet.nonEmpty) {
> val n = math.min(pred.length, k)
> var i = 0
> var cnt = 0
> while (i < n) {
> if (labSet.contains(pred(i))) {
> cnt += 1
> }
> i += 1
> }
> cnt.toDouble / k
> } else {
> logWarning("Empty ground truth set, check input data")
> 0.0
> }
> }.mean()
> }
>
>
> Am I the only one who thinks this doesn't do what it claims? Just for
> reference:
>
>
> -
> https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
> -
> https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
> --
> Best,
> Maciej
>
>
> --
> Maciej Szymkiewicz
>
>
Re: [MLLIB] RankingMetrics.precisionAt
Posted by Maciej Szymkiewicz <ms...@gmail.com>.
Thank you Sean.
Maybe I am just confused about the language. When I read that it returns
"the average precision at the first k ranking positions" I somehow
expect there will ap@k there and a the final output would be MAP@k not
average precision at the k-th position.
I guess it is not enough sleep.
On 12/06/2016 02:45 AM, Sean Owen wrote:
> I read it again and that looks like it implements mean precision@k as
> I would expect. What is the issue?
>
> On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz <mszymkiewicz@gmail.com
> <ma...@gmail.com>> wrote:
>
> Hi,
>
> Could I ask fora fresh pair of eyes on this piece of code:
>
> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
> @Since("1.2.0")
> def precisionAt(k: Int): Double = {
> require(k > 0, "ranking position k should be positive")
> predictionAndLabels.map { case (pred, lab) =>
> val labSet = lab.toSet
>
> if (labSet.nonEmpty) {
> val n = math.min(pred.length, k)
> var i = 0
> var cnt = 0
> while (i < n) {
> if (labSet.contains(pred(i))) {
> cnt += 1
> }
> i += 1
> }
> cnt.toDouble / k
> } else {
> logWarning("Empty ground truth set, check input data")
> 0.0
> }
> }.mean()
> }
>
>
> Am I the only one who thinks this doesn't do what it claims? Just
> for reference:
>
> * https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
> * https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
> --
> Best,
> Maciej
>
--
Maciej Szymkiewicz
Re: [MLLIB] RankingMetrics.precisionAt
Posted by Sean Owen <so...@cloudera.com>.
I read it again and that looks like it implements mean precision@k as I
would expect. What is the issue?
On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz <ms...@gmail.com>
wrote:
> Hi,
>
> Could I ask for a fresh pair of eyes on this piece of code:
>
>
> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
> @Since("1.2.0")
> def precisionAt(k: Int): Double = {
> require(k > 0, "ranking position k should be positive")
> predictionAndLabels.map { case (pred, lab) =>
> val labSet = lab.toSet
>
> if (labSet.nonEmpty) {
> val n = math.min(pred.length, k)
> var i = 0
> var cnt = 0
> while (i < n) {
> if (labSet.contains(pred(i))) {
> cnt += 1
> }
> i += 1
> }
> cnt.toDouble / k
> } else {
> logWarning("Empty ground truth set, check input data")
> 0.0
> }
> }.mean()
> }
>
>
> Am I the only one who thinks this doesn't do what it claims? Just for
> reference:
>
>
> -
> https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
> -
> https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
> --
> Best,
> Maciej
>
>