You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Vasia Kalavri (JIRA)" <ji...@apache.org> on 2015/07/31 18:19:04 UTC
[jira] [Created] (FLINK-2452) Add a playcount threshold to the
MusicProfiles examples
Vasia Kalavri created FLINK-2452:
------------------------------------
Summary: Add a playcount threshold to the MusicProfiles examples
Key: FLINK-2452
URL: https://issues.apache.org/jira/browse/FLINK-2452
Project: Flink
Issue Type: Improvement
Components: Gelly
Affects Versions: 0.10
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri
Priority: Minor
In the MusicProfiles example, when creating the user-user similarity graph, an edge is created between any 2 users that have listened to the same song (even if once). Depending on the input data, this might produce a projection graph with many more edges than the original user-song graph.
To make this computation more efficient, this issue proposes adding a user-defined parameter that filters out songs that a user has listened to only a few times. Essentially, it is a threshold for playcount, above which a user is considered to like a song.
For reference, with a threshold value of 30, the whole Last.fm dataset is analyzed on my laptop in a few minutes, while no threshold results in a runtime of several hours.
There are many solutions to this problem, but since this is just an example (not a library method), I think that keeping it simple is important.
Thanks to [~andralungu] for spotting the inefficiency!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)