You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Floris Devriendt <fl...@gmail.com> on 2014/07/14 16:21:23 UTC

Discrete Rating Scale

Hey all,

When using a discrete rating scale (e.g. likes / dislikes), what are the
things that I should consider when using Mahout for Collaborative Filtering?

If I'm not mistaking I've read a mail a week or two ago from this mailing
list stating that one should avoid using 0 (dislike) and 1 (like) as
scores, because Mahout would not be able to take into account the dislikes
properly.
If this is true, what scores should I give to my like/dislike scale? (e.g.
is -1/1 better than 0/1, or should I use 1/2 with 1 = dislike and 2 = like?)

Best regards,
Floris Devriendt

Re: Discrete Rating Scale

Posted by Floris Devriendt <fl...@gmail.com>.
Hey Mario,

Thanks for the fast reply. At the moment I'm not using the Hadoop version,
but everything from org.apache.mahout.cf.taste.impl.
I'm assuming your reasoning stays the same as with the Hadoop version (as
the similarities remain the same).

Similarities I'm going to use are Pearson, Tanimoto, LogLikelihood and an
extended version of the Tanimoto Coefficient (that takes into account like
/ dislike values).

If I'm not mistaken, the Tanimoto and LogLikelihood disregard the value of
preferences by default and so "like" and "dislike" are both treated as
"True" which, as you say, means "interacted with".

Thanks again for the answer, they were helpful!

Best regards,
Floris Devriendt




2014-07-14 17:52 GMT+02:00 <ma...@gmail.com>:

> If you are using the
> distributed org.apache.mahout.cf.taste.hadoop.item.RecommenderJob you
> should never use "0" . If you do that, when you multiply the co-occurence
> matrix times the user's rating vector you remove elements in the matrix,
> which is like if the user never interacted with the item.
>
> For the same reason, "-1" should work, because actually subtract score from
> any book which similar to the one with negative rating.
>
> For CosineSimilarity, 0 has to be avoided for obvious reasons (no cosine
> defined at the origin of the axis), and 1 and 2 are possibly the values I'd
> go for.
>
> Tanimoto and LogLikelihood are True/False, but False means "not
> interacted". Having "dislike = False" would be extremely misleading.
>
> For all the other algorithms, I'd say one should make similar
> considerations.
>
> Cheers
> Mario
>
>
> On Mon, Jul 14, 2014 at 4:21 PM, Floris Devriendt <
> florisdevriendt@gmail.com
> > wrote:
>
> > Hey all,
> >
> > When using a discrete rating scale (e.g. likes / dislikes), what are the
> > things that I should consider when using Mahout for Collaborative
> > Filtering?
> >
> > If I'm not mistaking I've read a mail a week or two ago from this mailing
> > list stating that one should avoid using 0 (dislike) and 1 (like) as
> > scores, because Mahout would not be able to take into account the
> dislikes
> > properly.
> > If this is true, what scores should I give to my like/dislike scale?
> (e.g.
> > is -1/1 better than 0/1, or should I use 1/2 with 1 = dislike and 2 =
> > like?)
> >
> > Best regards,
> > Floris Devriendt
> >
>

Re: Discrete Rating Scale

Posted by ma...@gmail.com.
If you are using the
distributed org.apache.mahout.cf.taste.hadoop.item.RecommenderJob you
should never use "0" . If you do that, when you multiply the co-occurence
matrix times the user's rating vector you remove elements in the matrix,
which is like if the user never interacted with the item.

For the same reason, "-1" should work, because actually subtract score from
any book which similar to the one with negative rating.

For CosineSimilarity, 0 has to be avoided for obvious reasons (no cosine
defined at the origin of the axis), and 1 and 2 are possibly the values I'd
go for.

Tanimoto and LogLikelihood are True/False, but False means "not
interacted". Having "dislike = False" would be extremely misleading.

For all the other algorithms, I'd say one should make similar
considerations.

Cheers
Mario


On Mon, Jul 14, 2014 at 4:21 PM, Floris Devriendt <florisdevriendt@gmail.com
> wrote:

> Hey all,
>
> When using a discrete rating scale (e.g. likes / dislikes), what are the
> things that I should consider when using Mahout for Collaborative
> Filtering?
>
> If I'm not mistaking I've read a mail a week or two ago from this mailing
> list stating that one should avoid using 0 (dislike) and 1 (like) as
> scores, because Mahout would not be able to take into account the dislikes
> properly.
> If this is true, what scores should I give to my like/dislike scale? (e.g.
> is -1/1 better than 0/1, or should I use 1/2 with 1 = dislike and 2 =
> like?)
>
> Best regards,
> Floris Devriendt
>

Re: Discrete Rating Scale

Posted by Ted Dunning <te...@gmail.com>.
You can already do a form of multi-modal collaborative filtering.  The way
you would do this is to create three item codes for every item, one for
positive rating, one for negative rating and one for buying or whatever
conversion you have.  You might even create a fourth code for any rating,
positive or negative.  Then translate your original data so that you lose
the rating and change the item to be the positive or negative version.
 Likewise, convert your sales lines to use the sale version of the item id.

When you run this through the ItemSimilarityJob, you will get indicators
for different combinations of items and actions.  You probably only care
about the conversion version of the items.  In the indicators for the item
+ conversion combos, you should see items with positive and negative
ratings and also item conversions.  You can push these indicators into a
amalgamated field or separate them in to separate fields when you push the
data into a search engine.

The general method for recommendation using a search engine is described in
this itty bitty book that Ellen and I wrote:

http://www.mapr.com/practical-machine-learning

This isn't as good as full-on multi-modal recommendations because one kind
of action can crowd out other kinds of action.  Pat is fixing this so that
we won't need the work-around.




On Mon, Jul 14, 2014 at 11:07 AM, Floris Devriendt <
florisdevriendt@gmail.com> wrote:

> Hey Ted Dunning,
>
> What is already possible on multi-modal recommendation in the
> non-hadoop-implementation of the Mahout recommenders?
> And if it's still under development, do you perhaps have a different
> suggestion (within the possibilities of Mahout)?
>
> Best regards,
> Floris Devriendt
>
>
>
>
>
> 2014-07-14 18:45 GMT+02:00 Ted Dunning <te...@gmail.com>:
>
> > I would separate the two interactions.  Type 1 is like.  Type 2 is
> dislike.
> >  They will have different correlations to different predicted
> interactions.
> >
> > This is an ideal use case for multi-modal recommendation.  Pat is working
> > on bringing that into the DSL as we speak.
> >
> >
> >
> > On Mon, Jul 14, 2014 at 7:21 AM, Floris Devriendt <
> > florisdevriendt@gmail.com
> > > wrote:
> >
> > > Hey all,
> > >
> > > When using a discrete rating scale (e.g. likes / dislikes), what are
> the
> > > things that I should consider when using Mahout for Collaborative
> > > Filtering?
> > >
> > > If I'm not mistaking I've read a mail a week or two ago from this
> mailing
> > > list stating that one should avoid using 0 (dislike) and 1 (like) as
> > > scores, because Mahout would not be able to take into account the
> > dislikes
> > > properly.
> > > If this is true, what scores should I give to my like/dislike scale?
> > (e.g.
> > > is -1/1 better than 0/1, or should I use 1/2 with 1 = dislike and 2 =
> > > like?)
> > >
> > > Best regards,
> > > Floris Devriendt
> > >
> >
>

Re: Discrete Rating Scale

Posted by Floris Devriendt <fl...@gmail.com>.
Hey Ted Dunning,

What is already possible on multi-modal recommendation in the
non-hadoop-implementation of the Mahout recommenders?
And if it's still under development, do you perhaps have a different
suggestion (within the possibilities of Mahout)?

Best regards,
Floris Devriendt





2014-07-14 18:45 GMT+02:00 Ted Dunning <te...@gmail.com>:

> I would separate the two interactions.  Type 1 is like.  Type 2 is dislike.
>  They will have different correlations to different predicted interactions.
>
> This is an ideal use case for multi-modal recommendation.  Pat is working
> on bringing that into the DSL as we speak.
>
>
>
> On Mon, Jul 14, 2014 at 7:21 AM, Floris Devriendt <
> florisdevriendt@gmail.com
> > wrote:
>
> > Hey all,
> >
> > When using a discrete rating scale (e.g. likes / dislikes), what are the
> > things that I should consider when using Mahout for Collaborative
> > Filtering?
> >
> > If I'm not mistaking I've read a mail a week or two ago from this mailing
> > list stating that one should avoid using 0 (dislike) and 1 (like) as
> > scores, because Mahout would not be able to take into account the
> dislikes
> > properly.
> > If this is true, what scores should I give to my like/dislike scale?
> (e.g.
> > is -1/1 better than 0/1, or should I use 1/2 with 1 = dislike and 2 =
> > like?)
> >
> > Best regards,
> > Floris Devriendt
> >
>

Re: Discrete Rating Scale

Posted by Ted Dunning <te...@gmail.com>.
I would separate the two interactions.  Type 1 is like.  Type 2 is dislike.
 They will have different correlations to different predicted interactions.

This is an ideal use case for multi-modal recommendation.  Pat is working
on bringing that into the DSL as we speak.



On Mon, Jul 14, 2014 at 7:21 AM, Floris Devriendt <florisdevriendt@gmail.com
> wrote:

> Hey all,
>
> When using a discrete rating scale (e.g. likes / dislikes), what are the
> things that I should consider when using Mahout for Collaborative
> Filtering?
>
> If I'm not mistaking I've read a mail a week or two ago from this mailing
> list stating that one should avoid using 0 (dislike) and 1 (like) as
> scores, because Mahout would not be able to take into account the dislikes
> properly.
> If this is true, what scores should I give to my like/dislike scale? (e.g.
> is -1/1 better than 0/1, or should I use 1/2 with 1 = dislike and 2 =
> like?)
>
> Best regards,
> Floris Devriendt
>