You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2011/03/03 19:23:23 UTC

Example code for use with Yahoo's KDD Cup

For those interested in recommenders -- the registration for Yahoo's KDD Cup
opened a few days ago: http://kddcup.yahoo.com/registration.php

The contest starts March 15, but, sample data sets are available now as well
as the input/output format and contest requirements:
http://kddcup.yahoo.com/datasets.php

I have written some code to help people play with these data sets and put it
in mahout-examples/. See package org.apache.mahout.cf.taste.example.kddcup .
(You need to download from Subversion of course; this doesn't exist in 0.4.)

Key features:

*KDDCupDataModel*: ingests the training data into memory as a DataModel.
This won't work with the full data set unless you have a huge amount of RAM,
so, there's a sampling rate parameter that will let you run on a percentage
of all the data only.
*DataFileIterator*: Easily iterate over the validation/test/training file
format
*KDDCupRecommender*: simple delegating recommender where you put in your own
implementation to try
*track1.Track1RecommenderEvaluatorRunner*: run an RMSE evaluation on your
implementation
*track1.Track1Runner*: output the result in the contest format
*track2.Track2Runner*: likewise for track 2's contest

Nothing here takes advantage of the song/album/artist info given in track
1's data, nor time information. That's an exercise for the reader.

It should work well but haven't thoroughly tested it yet.

Sean