You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Pablo J. Villacorta (JIRA)" <ji...@apache.org> on 2018/12/12 23:51:00 UTC
[jira] [Updated] (SPARK-26351) Documented formula of precision at k does not match the actual code

     [ https://issues.apache.org/jira/browse/SPARK-26351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pablo J. Villacorta updated SPARK-26351:
----------------------------------------
    Description: 
The formula of the *precision @ k* for measuring the quality of the recommendations:

[https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#ranking-systems]

says that j goes from 0 to *min(|D|, k)* , but according to the code, 

[https://github.com/apache/spark/blob/a63e7b2a212bab94d080b00cf1c5f397800a276a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L65]

 
{code:java}
val n = math.min(pred.length, k){code}
 

The notation of Spark documentation defines

D_i as the set of ground truth relevant documents for user i

R_i as the set of recommended documents (i.e. predictions) given for user i .

According to the code, the documentation should say j goes from 0 to *min( |R~i~|, k )*

  was:
The formula of the *precision @ k* for measuring the quality of the recommendations:

[https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#ranking-systems]

says that j goes from 0 to *min(|D|, k)* , but according to the code, 

[https://github.com/apache/spark/blob/a63e7b2a212bab94d080b00cf1c5f397800a276a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L65]

 
{code:java}
val n = math.min(pred.length, k){code}
 

The notation of Spark documentation defines

D~i~ as the set of ground truth relevant documents for user i

R~i~ as the set of recommended documents (i.e. predictions) given for user i .

According to the code, the documentation should say j goes from 0 to *min( |R~i~|, k )*


> Documented formula of precision at k does not match the actual code
> -------------------------------------------------------------------
>
>                 Key: SPARK-26351
>                 URL: https://issues.apache.org/jira/browse/SPARK-26351
>             Project: Spark
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 2.4.0
>            Reporter: Pablo J. Villacorta
>            Priority: Major
>
> The formula of the *precision @ k* for measuring the quality of the recommendations:
> [https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#ranking-systems]
> says that j goes from 0 to *min(|D|, k)* , but according to the code, 
> [https://github.com/apache/spark/blob/a63e7b2a212bab94d080b00cf1c5f397800a276a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L65]
>  
> {code:java}
> val n = math.min(pred.length, k){code}
>  
> The notation of Spark documentation defines
> D_i as the set of ground truth relevant documents for user i
> R_i as the set of recommended documents (i.e. predictions) given for user i .
> According to the code, the documentation should say j goes from 0 to *min( |R~i~|, k )*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org