You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Karl Wettin <ka...@gmail.com> on 2009/06/12 12:36:52 UTC

Tastify

Hi all,

I'm experimenting with Mahout by connecting it to Spotify <http://spotify.com/ 
 >, a service that streams music on the net. It would be really cool  
if you people could help me with a a ground truth. The data will of  
course be released to the public.

At first I tried to use playlists I scraped from the net as  
recommendation profiles. I'm not sure if it is my thesis that a user  
always likes everything they put in their own small and non  
collaborative playlists was wrong or if there was something else that  
made that strategy fail, so I have started from scratch with only real  
user preferences. For now it's something like 10 users and 500  
preferences, so don't expect it to produce great results.

http://tastify.kodapan.se:8081/

It will register your account the first to you login, and it will show  
you in clear text what password you choose, so choose something silly.

You use plain text queries or Spotify URIs in the search form. Start  
with spotify:user:karl.wettin:playlist:1LOXpeOdStzoavRodI4zXZ to  
connect with an already existing neighborhood. But please also try to  
add some ratings to tracks not available in that playlist, preferably  
some 10+.

Finally hit "Our recommendations" in order to get some results.

I have a handful of invites to Spotify if you don't have an account.  
Not needed to use the Tastify service though, only if you want to play  
the music.


Beware of the GUI. Lots of bugs, please report them if you see them.  
The serivce will go up and down now and then. Try to relogin if you  
get an exception.




        karl

Re: Tastify

Posted by Ted Dunning <te...@gmail.com>.

Hierarchical modeling techniques work well on structures like this if you
have good resolution of your meta-data.  Resolving and disambiguating artist
and track names can be difficult unless you have total control over the
meta-data source.

The basic idea is that you model an artist as a distribution over "concept
space", which is just a fancy name for latent  variables you don't plan to
understnad.   then an album is sampled from the artists and is another
distribution and finally a track is sampled from the album.  This is similar
to the way that in LDA, documents and words are distributions over your
latent concept variables.  Specific meanings are chosen at each point in a
document and the word you observe is chosen based on the concept at that
point.

Since you only observe which word appears in which document, you have to
reverse-engineer what the latent concepts might have been by getting a
compromise between the word and document distributions.

In your case, you have a simpler generative model, but similar techniques
should apply.

On Sat, Jun 13, 2009 at 8:53 AM, Karl Wettin <ka...@gmail.com> wrote:

>
> I hope that some semi-sophisticated Album, Track and ArtistSimilarity can
> be used to improve the results.
>
> Perhaps it's a good idea to have Playlist, Album and Artist implemented as
> Item too.

-- 
Ted Dunning, CTO
DeepDyve

Re: Tastify

Posted by Karl Wettin <ka...@gmail.com>.

12 jun 2009 kl. 16.15 skrev Ted Dunning:
> On Fri, Jun 12, 2009 at 3:36 AM, Karl Wettin <ka...@gmail.com>  
> wrote:
>
>> At first I tried to use playlists I scraped from the net as  
>> recommendation
>> profiles.
>

> Do you have raw play events, or do you have progress events as well?
>
> The single biggest improvement you can make with this kind of system  
> is to
> quantify engagement somehow.  Play starts are often a very poor  
> surrogate
> for preference while more engaged events such as 30 second progress  
> can be
> much better (or not, music consumption can be a bit strange).

No events, at least not for now.

I do however have a rather nice domain model to navigate:

 >
 >         /---------\
 >         |         |
 >         |   +similarArtists
 >         |         |
 >         |         V*
 >         \------[Artist]--------\
 >                 /  |1          |
 >    [Genre]<----/   |           |
 >           *        |           |*
 >                   *|           V
 >   [Item]<|- - -[Track]------[Album]
 >                   ^*       *
 >                   |
 >                   |
 >                   |
 >               [Playlist]
 >
(It's supposed to be an UML class diagram.)

I hope that some semi-sophisticated Album, Track and ArtistSimilarity  
can be used to improve the results.

Perhaps it's a good idea to have Playlist, Album and Artist  
implemented as Item too.



        karl

Re: Tastify

Posted by Ted Dunning <te...@gmail.com>.

Karl,

Do you have raw play events, or do you have progress events as well?

The single biggest improvement you can make with this kind of system is to
quantify engagement somehow.  Play starts are often a very poor surrogate
for preference while more engaged events such as 30 second progress can be
much better (or not, music consumption can be a bit strange).

On Fri, Jun 12, 2009 at 3:36 AM, Karl Wettin <ka...@gmail.com> wrote:

> At first I tried to use playlists I scraped from the net as recommendation
> profiles.