You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2015/11/04 00:20:59 UTC

Haters get Love too

A colleague of mine just build a MAP@k precision evaluator for the Mahout based cooccurrence recommender we’ve been working on and we ran some data scraped from rottentomatoes.com <http://rottentomatoes.com/> They have “fresh” and “rotten” reviews tied to reviewer ids.

A fair bit of discussion has gone on about how to use negative preferences. We have been saying that negative preferences might be predictive of positive preferences and the cross-cooccurrence code in the new SimilarityAnalysis.cooccurrence method can make the data usable.

We took the RT data for two “actions”: “fresh" as the primary, the best indicator of preference, and “rotten” as the secondary indicator. We found that MAP using only “fresh” was bettered by almost 20% when we included “rotten” as the secondary cross-cooccorrence action. For the strict out there we did not directly isolate the two actions, which is work remaining so some of the lift might be due to just having more data but it’s a really good first step because more data doesn't always translate to better performance and anyway it’s data you wouldn’t have otherwise.

This opens up a new way to compare all sorts of other user signals, some long considered to be unusable by recommenders. Gender, location, category preferences are now fair game for testing.

BTW we used this recommender, which is based on Mahout Samsara’s matrix math, cooccurrence and LLR. https://github.com/pferrel/scala-parallel-universal-recommendation <https://github.com/pferrel/scala-parallel-universal-recommendation>

Re: Haters get Love too

Posted by Ted Dunning <te...@gmail.com>.

No. Not entirely surprising, but it is *really* nice to get some public
results on this.

The treatment of the negatives as a separate cross term instead of just
lumping them together is a very significant difference.


On Tue, Nov 3, 2015 at 3:42 PM, Peter Jaumann <pe...@gmail.com>
wrote:

> Fascinating!!! Not too surprising really!!!
> On Nov 3, 2015 6:31 PM, "Suneel Marthi" <sm...@apache.org> wrote:
>
>> Thanks Pat, very interesting indeed.
>>
>> On Tue, Nov 3, 2015 at 6:20 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>>
>> > A colleague of mine just build a MAP@k precision evaluator for the
>> Mahout
>> > based cooccurrence recommender we’ve been working on and we ran some
>> data
>> > scraped from rottentomatoes.com <http://rottentomatoes.com/> They have
>> > “fresh” and “rotten” reviews tied to reviewer ids.
>> >
>> > A fair bit of discussion has gone on about how to use negative
>> > preferences. We have been saying that negative preferences might be
>> > predictive of positive preferences and the cross-cooccurrence code in
>> the
>> > new SimilarityAnalysis.cooccurrence method can make the data usable.
>> >
>> > We took the RT data for two “actions”: “fresh" as the primary, the best
>> > indicator of preference, and “rotten” as the secondary indicator. We
>> found
>> > that MAP using only “fresh” was bettered by almost 20% when we included
>> > “rotten” as the secondary cross-cooccorrence action. For the strict out
>> > there we did not directly isolate the two actions, which is work
>> remaining
>> > so some of the lift might be due to just having more data but it’s a
>> really
>> > good first step because more data doesn't always translate to better
>> > performance and anyway it’s data you wouldn’t have otherwise.
>> >
>> > This opens up a new way to compare all sorts of other user signals, some
>> > long considered to be unusable by recommenders. Gender, location,
>> category
>> > preferences are now fair game for testing.
>> >
>> > BTW we used this recommender, which is based on Mahout Samsara’s matrix
>> > math, cooccurrence and LLR.
>> > https://github.com/pferrel/scala-parallel-universal-recommendation <
>> > https://github.com/pferrel/scala-parallel-universal-recommendation>
>>
>

Re: Haters get Love too

Posted by Peter Jaumann <pe...@gmail.com>.

Fascinating!!! Not too surprising really!!!
On Nov 3, 2015 6:31 PM, "Suneel Marthi" <sm...@apache.org> wrote:

> Thanks Pat, very interesting indeed.
>
> On Tue, Nov 3, 2015 at 6:20 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > A colleague of mine just build a MAP@k precision evaluator for the
> Mahout
> > based cooccurrence recommender we’ve been working on and we ran some data
> > scraped from rottentomatoes.com <http://rottentomatoes.com/> They have
> > “fresh” and “rotten” reviews tied to reviewer ids.
> >
> > A fair bit of discussion has gone on about how to use negative
> > preferences. We have been saying that negative preferences might be
> > predictive of positive preferences and the cross-cooccurrence code in the
> > new SimilarityAnalysis.cooccurrence method can make the data usable.
> >
> > We took the RT data for two “actions”: “fresh" as the primary, the best
> > indicator of preference, and “rotten” as the secondary indicator. We
> found
> > that MAP using only “fresh” was bettered by almost 20% when we included
> > “rotten” as the secondary cross-cooccorrence action. For the strict out
> > there we did not directly isolate the two actions, which is work
> remaining
> > so some of the lift might be due to just having more data but it’s a
> really
> > good first step because more data doesn't always translate to better
> > performance and anyway it’s data you wouldn’t have otherwise.
> >
> > This opens up a new way to compare all sorts of other user signals, some
> > long considered to be unusable by recommenders. Gender, location,
> category
> > preferences are now fair game for testing.
> >
> > BTW we used this recommender, which is based on Mahout Samsara’s matrix
> > math, cooccurrence and LLR.
> > https://github.com/pferrel/scala-parallel-universal-recommendation <
> > https://github.com/pferrel/scala-parallel-universal-recommendation>
>

Re: Haters get Love too

Posted by Suneel Marthi <sm...@apache.org>.

Thanks Pat, very interesting indeed.

On Tue, Nov 3, 2015 at 6:20 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> A colleague of mine just build a MAP@k precision evaluator for the Mahout
> based cooccurrence recommender we’ve been working on and we ran some data
> scraped from rottentomatoes.com <http://rottentomatoes.com/> They have
> “fresh” and “rotten” reviews tied to reviewer ids.
>
> A fair bit of discussion has gone on about how to use negative
> preferences. We have been saying that negative preferences might be
> predictive of positive preferences and the cross-cooccurrence code in the
> new SimilarityAnalysis.cooccurrence method can make the data usable.
>
> We took the RT data for two “actions”: “fresh" as the primary, the best
> indicator of preference, and “rotten” as the secondary indicator. We found
> that MAP using only “fresh” was bettered by almost 20% when we included
> “rotten” as the secondary cross-cooccorrence action. For the strict out
> there we did not directly isolate the two actions, which is work remaining
> so some of the lift might be due to just having more data but it’s a really
> good first step because more data doesn't always translate to better
> performance and anyway it’s data you wouldn’t have otherwise.
>
> This opens up a new way to compare all sorts of other user signals, some
> long considered to be unusable by recommenders. Gender, location, category
> preferences are now fair game for testing.
>
> BTW we used this recommender, which is based on Mahout Samsara’s matrix
> math, cooccurrence and LLR.
> https://github.com/pferrel/scala-parallel-universal-recommendation <
> https://github.com/pferrel/scala-parallel-universal-recommendation>

Re: Haters get Love too

Posted by Ted Dunning <te...@gmail.com>.

On Tue, Nov 3, 2015 at 3:20 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> For the strict out there we did not directly isolate the two actions,
> which is work remaining so some of the lift might be due to just having
> more data but it’s a really good first step because more data doesn't
> always translate to better performance and anyway it’s data you wouldn’t
> have otherwise.
>

I am pretty strict, but I really don't think that it matters that the lift
might be due to more data.

The ability to use that more data is the key advance.