You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Domokos Miklós Kelen (JIRA)" <ji...@apache.org> on 2016/09/29 15:22:20 UTC

[jira] [Created] (FLINK-4713) Implementing ranking evaluation scores for recommender systems

Domokos Miklós Kelen created FLINK-4713:
-------------------------------------------

             Summary: Implementing ranking evaluation scores for recommender systems
                 Key: FLINK-4713
                 URL: https://issues.apache.org/jira/browse/FLINK-4713
             Project: Flink
          Issue Type: New Feature
          Components: Machine Learning Library
            Reporter: Domokos Miklós Kelen


Follow up work to [4712|https://issues.apache.org/jira/browse/FLINK-4712] includes implementing ranking recommendation evaluation metrics (such as precision@k, recall@k, ndcg@k), [similar to Spark's implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems]. It would be beneficial if we were able to design the API such that it could be included in the proposed evaluation framework (see [2157|https://issues.apache.org/jira/browse/FLINK-2157]).

In it's current form, this would mean generalizing the PredictionType type parameter of the Score class to allow for {{Array[Int]}} or {{Array[(Int, Double)]}}, and outputting the recommendations in the form {{DataSet[(Int, Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, array of items), possibly including the predicted scores as well. 

However, calculating for example nDCG for a given user u requires us to be able to access all of the (u, item, relevance) records in the test dataset, which means we would need to put this information in the second element of the {{DataSet[(PredictionType, PredictionType)]}} input of the scorer function as PredictionType={{Array[(Int, Double)]}}. This is problematic, as this Array could be arbitrarily long.

Another option is to further rework the proposed evaluation framework to allow us to implement this properly, with inputs in the form of {{recommendations : DataSet[(Int,Int,Int)]}} (user, item, rank) and {{test : DataSet[(Int,Int,Double)]}} (user, item relevance). This way, the scores could be implemented such that they can be calculated in a distributed way.

The third option is to implement the scorer functions outside the evaluation framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)