You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Domokos Miklós Kelen (JIRA)" <ji...@apache.org> on 2016/09/29 15:22:20 UTC
[jira] [Created] (FLINK-4713) Implementing ranking evaluation
scores for recommender systems
Domokos Miklós Kelen created FLINK-4713:
-------------------------------------------
Summary: Implementing ranking evaluation scores for recommender systems
Key: FLINK-4713
URL: https://issues.apache.org/jira/browse/FLINK-4713
Project: Flink
Issue Type: New Feature
Components: Machine Learning Library
Reporter: Domokos Miklós Kelen
Follow up work to [4712|https://issues.apache.org/jira/browse/FLINK-4712] includes implementing ranking recommendation evaluation metrics (such as precision@k, recall@k, ndcg@k), [similar to Spark's implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems]. It would be beneficial if we were able to design the API such that it could be included in the proposed evaluation framework (see [2157|https://issues.apache.org/jira/browse/FLINK-2157]).
In it's current form, this would mean generalizing the PredictionType type parameter of the Score class to allow for {{Array[Int]}} or {{Array[(Int, Double)]}}, and outputting the recommendations in the form {{DataSet[(Int, Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, array of items), possibly including the predicted scores as well.
However, calculating for example nDCG for a given user u requires us to be able to access all of the (u, item, relevance) records in the test dataset, which means we would need to put this information in the second element of the {{DataSet[(PredictionType, PredictionType)]}} input of the scorer function as PredictionType={{Array[(Int, Double)]}}. This is problematic, as this Array could be arbitrarily long.
Another option is to further rework the proposed evaluation framework to allow us to implement this properly, with inputs in the form of {{recommendations : DataSet[(Int,Int,Int)]}} (user, item, rank) and {{test : DataSet[(Int,Int,Double)]}} (user, item relevance). This way, the scores could be implemented such that they can be calculated in a distributed way.
The third option is to implement the scorer functions outside the evaluation framework.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)