You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by deneche abdelhakim <a_...@yahoo.fr> on 2008/09/03 10:28:51 UTC

Re : FYI Cloud Computing Resources

I came across the following competition

http://www.netflixprize.com/index


It's about recommender systems, so I think it's a Taste stuff. The training dataset consists of more than 100M ratings.


----- Message d'origine ----
De : Josh Myer <jo...@joshisanerd.com>
À : mahout-dev@lucene.apache.org
Envoyé le : Mercredi, 30 Juillet 2008, 18h19mn 25s
Objet : Re: FYI Cloud Computing Resources

On Wed, Jul 30, 2008 at 11:26:29AM -0400, Grant Ingersoll wrote:
> http://research.yahoo.com/node/2328
> 
> It _MAY_ (stressed, emphasized, etc.) be possible for Mahouters (or  
> are we just Mahouts?) to get some access to these resources.  One big  
> question is where can we get some fairly large data sets (large, but  
> not super large, I think, but am not sure)
> 
> If you have ideas, etc. please let us know.
> 

It's worth plugging (theinfo), http://theinfo.org/.  It's a project to
collect references to datasets, and may help here.  Unfortunately, it
seems to be laggy at the moment.  I'll poke Aaron about that =)

HtH,
-- 
Josh Myer
josh@joshisanerd.com



      

Re: Re : FYI Cloud Computing Resources

Posted by Grant Ingersoll <gs...@apache.org>.
On Sep 3, 2008, at 4:34 AM, Sean Owen wrote:

> Yeah it's almost over unfortunately. :) I tried this a while ago with
> a slope-one recommender, and was only about able to match Netflix's
> current performance. I published some support code for people who
> wanted to play with it but removed it from Mahout's copy as legacy
> code.

Hmm, probably useful to keep the code around, even if it's just used  
as a sample of how to do things w/ Taste.  I imagine the Netflix data  
will live on for quite some time.

>
>
> I didn't really have time to investigate more. Some of the insights
> that have fallen out from the competition are pretty great. For
> example: one person took advantage of a sort of "memory effect" for
> recommendations.... people tend to at times over-rate movies and at
> times under-rate movies. So if you kind of correct for this -- that a
> sequence of 5-star ratings may not be as meaningful as a 5-star rating
> in the middle of several 2-star ratings, you get much better
> performance.
>
> This nugget of knowledge may be specific to Netflix, not sure. But it
> was interesting.
>
> On Wed, Sep 3, 2008 at 9:28 AM, deneche abdelhakim  
> <a_...@yahoo.fr> wrote:
>> I came across the following competition
>>
>> http://www.netflixprize.com/index
>>
>>
>> It's about recommender systems, so I think it's a Taste stuff. The  
>> training dataset consists of more than 100M ratings.
>>
>>
>> ----- Message d'origine ----
>> De : Josh Myer <jo...@joshisanerd.com>
>> À : mahout-dev@lucene.apache.org
>> Envoyé le : Mercredi, 30 Juillet 2008, 18h19mn 25s
>> Objet : Re: FYI Cloud Computing Resources
>>
>> On Wed, Jul 30, 2008 at 11:26:29AM -0400, Grant Ingersoll wrote:
>>> http://research.yahoo.com/node/2328
>>>
>>> It _MAY_ (stressed, emphasized, etc.) be possible for Mahouters (or
>>> are we just Mahouts?) to get some access to these resources.  One  
>>> big
>>> question is where can we get some fairly large data sets (large, but
>>> not super large, I think, but am not sure)
>>>
>>> If you have ideas, etc. please let us know.
>>>
>>
>> It's worth plugging (theinfo), http://theinfo.org/.  It's a project  
>> to
>> collect references to datasets, and may help here.  Unfortunately, it
>> seems to be laggy at the moment.  I'll poke Aaron about that =)
>>
>> HtH,
>> --
>> Josh Myer
>> josh@joshisanerd.com
>>
>>
>>
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








Re: Re : FYI Cloud Computing Resources

Posted by Sean Owen <sr...@gmail.com>.
Yeah it's almost over unfortunately. :) I tried this a while ago with
a slope-one recommender, and was only about able to match Netflix's
current performance. I published some support code for people who
wanted to play with it but removed it from Mahout's copy as legacy
code.

I didn't really have time to investigate more. Some of the insights
that have fallen out from the competition are pretty great. For
example: one person took advantage of a sort of "memory effect" for
recommendations.... people tend to at times over-rate movies and at
times under-rate movies. So if you kind of correct for this -- that a
sequence of 5-star ratings may not be as meaningful as a 5-star rating
in the middle of several 2-star ratings, you get much better
performance.

This nugget of knowledge may be specific to Netflix, not sure. But it
was interesting.

On Wed, Sep 3, 2008 at 9:28 AM, deneche abdelhakim <a_...@yahoo.fr> wrote:
> I came across the following competition
>
> http://www.netflixprize.com/index
>
>
> It's about recommender systems, so I think it's a Taste stuff. The training dataset consists of more than 100M ratings.
>
>
> ----- Message d'origine ----
> De : Josh Myer <jo...@joshisanerd.com>
> À : mahout-dev@lucene.apache.org
> Envoyé le : Mercredi, 30 Juillet 2008, 18h19mn 25s
> Objet : Re: FYI Cloud Computing Resources
>
> On Wed, Jul 30, 2008 at 11:26:29AM -0400, Grant Ingersoll wrote:
>> http://research.yahoo.com/node/2328
>>
>> It _MAY_ (stressed, emphasized, etc.) be possible for Mahouters (or
>> are we just Mahouts?) to get some access to these resources.  One big
>> question is where can we get some fairly large data sets (large, but
>> not super large, I think, but am not sure)
>>
>> If you have ideas, etc. please let us know.
>>
>
> It's worth plugging (theinfo), http://theinfo.org/.  It's a project to
> collect references to datasets, and may help here.  Unfortunately, it
> seems to be laggy at the moment.  I'll poke Aaron about that =)
>
> HtH,
> --
> Josh Myer
> josh@joshisanerd.com
>
>
>
>
>