You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Danny Bickson <da...@gmail.com> on 2011/05/14 23:34:51 UTC

To all the recommendation people..

Another interesting collaborative filtering contest with a big prize of 1M $.
See http://overstockreclabprize.com/

- Danny Bickson

Re: To all the recommendation people..

Posted by Grant Ingersoll <gs...@apache.org>.

I think it would be cool to have a purely open source submission.  I wonder if there terms & conditions allow it.  If anything, it makes for another nice example in our examples dir.

-G

On May 14, 2011, at 5:34 PM, Danny Bickson wrote:

> Another interesting collaborative filtering contest with a big prize of 1M $.
> See http://overstockreclabprize.com/
> 
> - Danny Bickson

Re: To all the recommendation people..

Posted by Jake Mannix <ja...@gmail.com>.

Egads.  RecLab's got their own clone of hadoop in there, with basically the
same signature java apis as hadoop 0.18...  And they even talk about writing
a "CooccurrenceReducer" in the tutorial!

On Sat, May 14, 2011 at 3:32 PM, Jake Mannix <ja...@gmail.com> wrote:

> You're allowed to be an individual, or a team not associated with an
> academic institution, according to what I'm reading on that page...
>
>
> On Sat, May 14, 2011 at 3:13 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
>> Ah, never mind.  Academics only.  :-(
>>
>>
>> On May 14, 2011, at 5:34 PM, Danny Bickson wrote:
>>
>> > Another interesting collaborative filtering contest with a big prize of
>> 1M $.
>> > See http://overstockreclabprize.com/
>> >
>> > - Danny Bickson
>>
>>
>>
>

Re: To all the recommendation people..

Posted by Grant Ingersoll <gs...@apache.org>.

More contests at: http://challenge.gov/NIH/132-nlm-show-off-your-apps-innovative-uses-of-nlm-information


On May 15, 2011, at 10:25 PM, Alex Kozlov wrote:

> On Sat, May 14, 2011 at 9:11 PM, Jake Mannix <ja...@gmail.com> wrote:
> 
>> Due to the whole Netflix data lawsuit, the training data is synthetic,
>> which
>> puts the contestants at a disadvantage, and another interesting fact:
>> runtime
>> performance is at issue: your code will be run *live*, with your model
>> being
>> used to produce recommendations with a hard timeout of 50ms - if you
>> miss this more than 20% of the time, you fail to progress to the end of
>> the semi-final round.
>> 
> 
> If the dataset is synthetic (and I assume not random) is the goal to just
> guess the model that generated the dataset?  Assuming it performs well, how
> far us the 'synthetic' model from the actual customer behavior so that there
> are no 'surprises' when it runs 'live'?
> 
> Potentially, there are more avenues for a lawsuit than in the Netflix case
> since money is involved (just a thought).
> 
> Alex K

--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org

Re: To all the recommendation people..

Posted by Alex Kozlov <al...@cloudera.com>.

On Sat, May 14, 2011 at 9:11 PM, Jake Mannix <ja...@gmail.com> wrote:

> Due to the whole Netflix data lawsuit, the training data is synthetic,
> which
> puts the contestants at a disadvantage, and another interesting fact:
> runtime
> performance is at issue: your code will be run *live*, with your model
> being
> used to produce recommendations with a hard timeout of 50ms - if you
> miss this more than 20% of the time, you fail to progress to the end of
> the semi-final round.
>

If the dataset is synthetic (and I assume not random) is the goal to just
guess the model that generated the dataset?  Assuming it performs well, how
far us the 'synthetic' model from the actual customer behavior so that there
are no 'surprises' when it runs 'live'?

Potentially, there are more avenues for a lawsuit than in the Netflix case
since money is involved (just a thought).

Alex K

Re: To all the recommendation people..

Posted by Jake Mannix <ja...@gmail.com>.

It's actually a pretty interesting challenge, once you get past the
constrictions of
their API: you're optimizing explicitly for revenue-per-session, take as
input
past sessions, which include the kinds of practical things you'd like: each
session is by a userId which will naturally include repeat customers,
products
have prices, and there are categoryId labels already.

Due to the whole Netflix data lawsuit, the training data is synthetic, which
puts the contestants at a disadvantage, and another interesting fact:
runtime
performance is at issue: your code will be run *live*, with your model being
used to produce recommendations with a hard timeout of 50ms - if you
miss this more than 20% of the time, you fail to progress to the end of
the semi-final round.

You're allowed to use open-source Apache licensed code (and are in fact
*required* to license your code according to the ASL to compete), but
their APIs are, while extraordinarily similar to Hadoop and Mahout/Taste,
are fixed, so you can't just do drop-in replacement.

On Sat, May 14, 2011 at 6:45 PM, Grant Ingersoll <gs...@apache.org>wrote:

> Ah, you are right.  Read too quickly.
>
> On May 14, 2011, at 6:32 PM, Jake Mannix wrote:
>
> > You're allowed to be an individual, or a team not associated with an
> > academic institution, according to what I'm reading on that page...
> >
> > On Sat, May 14, 2011 at 3:13 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
> >
> >> Ah, never mind.  Academics only.  :-(
> >>
> >>
> >> On May 14, 2011, at 5:34 PM, Danny Bickson wrote:
> >>
> >>> Another interesting collaborative filtering contest with a big prize of
> >> 1M $.
> >>> See http://overstockreclabprize.com/
> >>>
> >>> - Danny Bickson
> >>
> >>
> >>
>
>
>

Re: To all the recommendation people..

Posted by Grant Ingersoll <gs...@apache.org>.

Ah, you are right.  Read too quickly.

On May 14, 2011, at 6:32 PM, Jake Mannix wrote:

> You're allowed to be an individual, or a team not associated with an
> academic institution, according to what I'm reading on that page...
> 
> On Sat, May 14, 2011 at 3:13 PM, Grant Ingersoll <gs...@apache.org>wrote:
> 
>> Ah, never mind.  Academics only.  :-(
>> 
>> 
>> On May 14, 2011, at 5:34 PM, Danny Bickson wrote:
>> 
>>> Another interesting collaborative filtering contest with a big prize of
>> 1M $.
>>> See http://overstockreclabprize.com/
>>> 
>>> - Danny Bickson
>> 
>> 
>>

Re: To all the recommendation people..

Posted by Jake Mannix <ja...@gmail.com>.

You're allowed to be an individual, or a team not associated with an
academic institution, according to what I'm reading on that page...

On Sat, May 14, 2011 at 3:13 PM, Grant Ingersoll <gs...@apache.org>wrote:

> Ah, never mind.  Academics only.  :-(
>
>
> On May 14, 2011, at 5:34 PM, Danny Bickson wrote:
>
> > Another interesting collaborative filtering contest with a big prize of
> 1M $.
> > See http://overstockreclabprize.com/
> >
> > - Danny Bickson
>
>
>

Re: To all the recommendation people..

Posted by Grant Ingersoll <gs...@apache.org>.

Ah, never mind.  Academics only.  :-(    


On May 14, 2011, at 5:34 PM, Danny Bickson wrote:

> Another interesting collaborative filtering contest with a big prize of 1M $.
> See http://overstockreclabprize.com/
> 
> - Danny Bickson