You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Will C <wi...@infomofo.com> on 2012/05/06 19:48:24 UTC

Re: Recommendation scores from LogLikelihood Similarity recommender

So I've taken another try at using recommendations values.  However, unlike
something that a user is explicitly rating on a scale of 0-5. I am using a
user's activity.  Certain activities of a user toward an item are negative,
and certain are positive.

If I have users 1 and 2 and 3, and product X, and their preferences are as
follows:

1, X, -1
2, X, 1
3, X, 10

Clearly 2 and 3 are closer than 2 and 1, because they both like product X,
just to varying degrees.  However, most distance algorithms I've tried are
incorrectly showing 1 and 2 closer because their difference is less.

Am I approaching this wrong?  Other than switching to boolean preferences,
is there a better way to approach this?

-Will

On Mon, Apr 16, 2012 at 2:35 PM, Will C <wi...@infomofo.com> wrote:

> Thanks for clearing that up.
>
>
> On Mon, Apr 16, 2012 at 2:02 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> In the case of no ratings, the value you observe is *not* a predicted
>> rating. After all, they are all 1.0 and so can't be used for ranking.
>> The result is actually a sum of similarities, which is why it can be
>> arbitrarily large. It is not supposed to be in [0,1] or anything like
>> that.
>>
>> On Sun, Apr 15, 2012 at 5:47 PM, Will C <wi...@infomofo.com> wrote:
>> > I have a boolean input dataset, with user, item, and preference.  Each
>> > preference is a 1.0 if it exists.  Based on this dataset I had used a
>> > Tanimoto Similarity and tried both Boolean Pref User and Item
>> Recommenders.
>> >
>> >
>> > After reading Mahout in Action and several threads on stack overflow, I
>> saw
>> > that the LogLikelihood Similarity model was recommended for boolean
>> dataset
>> > recommenders.
>> >
>> > However, the scores I get for the recommended items using the
>> LogLikelihood
>> > similarity are sometimes much greater than 1.0, even though none of the
>> > input scores are higher than that.  I saw scores of 11.0 being returned
>> for
>> > some users' recommendations.
>> >
>> > This is making it very hard for me to use the scoring and estimation
>> > functions.  I have switched back to Tanimoto for now, but am I doing
>> > something wrong, or am I incorrect in expecting the recommended scores
>> and
>> > estimated preferences to be in the 0-1.0 range for this dataset?
>>
>
>

Re: Recommendation scores from LogLikelihood Similarity recommender

Posted by Will C <wi...@infomofo.com>.

Heh you're reading my mind.

I tried the cosine similarity and had exactly the problem with sparse
rating recommendations that you mentioned.  I'm switching back to the
boolean data set and just having a minimum action threshold to cross, and I
was just in the process of moving my logic around to handle negative
actions as a filter.

Thanks for the quick responses!

-Will

On Sun, May 6, 2012 at 3:53 PM, Ted Dunning <te...@gmail.com> wrote:

> As Sean points out, cosine should pick up on this.  You will have the usual
> problems with small counts that any rating based system has.
>
> And in spite of your last comment, I would strongly recommend that you test
> a boolean approach where in *any* action is considered positive and another
> where you consider only your positive actions and ignore your negative
> actions.  If necessary, consider the negative actions at the presentation
> tier.
>
> On Sun, May 6, 2012 at 10:48 AM, Will C <wi...@infomofo.com> wrote:
>
> > So I've taken another try at using recommendations values.  However,
> unlike
> > something that a user is explicitly rating on a scale of 0-5. I am using
> a
> > user's activity.  Certain activities of a user toward an item are
> negative,
> > and certain are positive.
> >
> > If I have users 1 and 2 and 3, and product X, and their preferences are
> as
> > follows:
> >
> > 1, X, -1
> > 2, X, 1
> > 3, X, 10
> >
> > Clearly 2 and 3 are closer than 2 and 1, because they both like product
> X,
> > just to varying degrees.  However, most distance algorithms I've tried
> are
> > incorrectly showing 1 and 2 closer because their difference is less.
> >
> > Am I approaching this wrong?  Other than switching to boolean
> preferences,
> > is there a better way to approach this?
> >
>

Re: Recommendation scores from LogLikelihood Similarity recommender

Posted by Ted Dunning <te...@gmail.com>.

As Sean points out, cosine should pick up on this.  You will have the usual
problems with small counts that any rating based system has.

And in spite of your last comment, I would strongly recommend that you test
a boolean approach where in *any* action is considered positive and another
where you consider only your positive actions and ignore your negative
actions.  If necessary, consider the negative actions at the presentation
tier.

On Sun, May 6, 2012 at 10:48 AM, Will C <wi...@infomofo.com> wrote:

> So I've taken another try at using recommendations values.  However, unlike
> something that a user is explicitly rating on a scale of 0-5. I am using a
> user's activity.  Certain activities of a user toward an item are negative,
> and certain are positive.
>
> If I have users 1 and 2 and 3, and product X, and their preferences are as
> follows:
>
> 1, X, -1
> 2, X, 1
> 3, X, 10
>
> Clearly 2 and 3 are closer than 2 and 1, because they both like product X,
> just to varying degrees.  However, most distance algorithms I've tried are
> incorrectly showing 1 and 2 closer because their difference is less.
>
> Am I approaching this wrong?  Other than switching to boolean preferences,
> is there a better way to approach this?
>

Re: Recommendation scores from LogLikelihood Similarity recommender

Posted by Sean Owen <sr...@gmail.com>.

That sounds a lot like something that the cosine similarity would pick up
on for sure.

On Sun, May 6, 2012 at 6:48 PM, Will C <wi...@infomofo.com> wrote:

> So I've taken another try at using recommendations values.  However, unlike
> something that a user is explicitly rating on a scale of 0-5. I am using a
> user's activity.  Certain activities of a user toward an item are negative,
> and certain are positive.
>
> If I have users 1 and 2 and 3, and product X, and their preferences are as
> follows:
>
> 1, X, -1
> 2, X, 1
> 3, X, 10
>
> Clearly 2 and 3 are closer than 2 and 1, because they both like product X,
> just to varying degrees.  However, most distance algorithms I've tried are
> incorrectly showing 1 and 2 closer because their difference is less.
>
> Am I approaching this wrong?  Other than switching to boolean preferences,
> is there a better way to approach this?
>
> -Will