You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Simon Reavely <si...@gmail.com> on 2010/07/27 22:55:31 UTC

Best way to do a recommendation engine based on CLR (Click Through Rate)

Hi,

I am wondering what the best way is to implement recommendations based on
click through rates. What I have:
- user id
- resource (i.e. item)
- click through count for user on that resource

I'm reading Mahout in Action MEAP right now (very good so far). Mahout seems
to be very preference based (votings/ratings) but I know (reading Mahout in
Action) that it also supports preference-less recommendations. However,
since I have a click through count preference-less recommendation seems to
be throwing away this click through data.

I wondered if I can somehow convert click through count to a preference or
if I should take another approach.

Some ideas I had:
- Just use the click through count as the preference (knowing that different
users will have widely different counts).
- Normalize the click count across users to say a 0-100 scale
- ok, that's it...only two ideas so far!

Any suggestions/patterns?
Any warnings/anti-patterns?

It seems like this should be a really common use-case for recommendations.

Cheers,
Simon

-- 
Simon Reavely
simon.reavely@gmail.com

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Ted Dunning <te...@gmail.com>.
Very common.  Partly because clicks outnumber explicit ratings by 100:1 on a
typical web site.

On Tue, Jul 27, 2010 at 1:55 PM, Simon Reavely <si...@gmail.com>wrote:

> It seems like this should be a really common use-case for recommendations.
>

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Ted Dunning <te...@gmail.com>.
Do your users click more than once on an item?

If you do intend to use total clicks, at least normalize them by number of
appearances.

Better still, if you have some indicator that the click wasn't adventitious,
use that.  Once upon a time, for instance, we tried clicks on a video.  It
became clear later that successfully watching 30 seconds of video was a much
better indicator since it avoided problems with misleading titles and
possibly even broken videos.  Using a javascript timer to emit a beacon hit
after sitting on a page for a bit is one way to get this kind of info on a
normal web-site.

I generally would be against counting clicks but would rather just collect a
1 if clicks(user, item) > 0


On Tue, Jul 27, 2010 at 1:55 PM, Simon Reavely <si...@gmail.com>wrote:

> Hi,
>
> I am wondering what the best way is to implement recommendations based on
> click through rates. What I have:
> - user id
> - resource (i.e. item)
> - click through count for user on that resource
>
> I'm reading Mahout in Action MEAP right now (very good so far). Mahout
> seems
> to be very preference based (votings/ratings) but I know (reading Mahout in
> Action) that it also supports preference-less recommendations. However,
> since I have a click through count preference-less recommendation seems to
> be throwing away this click through data.
>
> I wondered if I can somehow convert click through count to a preference or
> if I should take another approach.
>
> Some ideas I had:
> - Just use the click through count as the preference (knowing that
> different
> users will have widely different counts).
> - Normalize the click count across users to say a 0-100 scale
> - ok, that's it...only two ideas so far!
>
> Any suggestions/patterns?
> Any warnings/anti-patterns?
>
> It seems like this should be a really common use-case for recommendations.
>
> Cheers,
> Simon
>
> --
> Simon Reavely
> simon.reavely@gmail.com
>

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Ted Dunning <te...@gmail.com>.
I can only quantify this very, very roughly.

In my experience it takes 5-10 people on a single item to make
recommendations work with the LLR metrics.  From this and the long-tail
characteristics, I think you can take it back to a real sparsity number, but
I think that this is probably easy enough to work with.

Sean can probably give you a more broad perspective since he has worked with
a variety of different systems and different scales.

2010/7/29 Matthias Böhmer <ma...@m-boehmer.de>

> > A quick test would be to consider two clicks to be required as a measure
> of
> > interest.  My guess is that your data will suddenly become too sparse to
> > use.
>
> Can you quantify this? Lets say a recommender system has n users and m
> items. Is it possible con conclude, how much data (ratings) you need
> to run recommendations? Are there any rough estimations for judging
> the sparsity of the data?
>
>

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Matthias Böhmer <ma...@m-boehmer.de>.
> A quick test would be to consider two clicks to be required as a measure of
> interest.  My guess is that your data will suddenly become too sparse to
> use.

Can you quantify this? Lets say a recommender system has n users and m
items. Is it possible con conclude, how much data (ratings) you need
to run recommendations? Are there any rough estimations for judging
the sparsity of the data?



2010/7/28 Ted Dunning <te...@gmail.com>:
> On Wed, Jul 28, 2010 at 12:53 PM, Simon Reavely <si...@gmail.com>wrote:
>
>> However, since I have no idea how significant extra clicks are I think this
>> is a good one to start with.
>>
>
> To 0-th order, they aren't. :-)
>
>
>>
>> With more work I'll try to evaluate the quality of that measure (number of
>> clicks) and look for others that could be better indications of preference.
>>
>
> A quick test would be to consider two clicks to be required as a measure of
> interest.  My guess is that your data will suddenly become too sparse to
> use.
>



-- 
--

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Jul 28, 2010 at 12:53 PM, Simon Reavely <si...@gmail.com>wrote:

> However, since I have no idea how significant extra clicks are I think this
> is a good one to start with.
>

To 0-th order, they aren't. :-)


>
> With more work I'll try to evaluate the quality of that measure (number of
> clicks) and look for others that could be better indications of preference.
>

A quick test would be to consider two clicks to be required as a measure of
interest.  My guess is that your data will suddenly become too sparse to
use.

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Simon Reavely <si...@gmail.com>.
Thanks to everyone for their comments; very helpful indeed.

One question/observation on LogLikelihood similarity, if I'm right this
metric does not take into account preference values so I would just be
looking at what users have clicked on, not how many times they have clicked.
However, since I have no idea how significant extra clicks are I think this
is a good one to start with.

With more work I'll try to evaluate the quality of that measure (number of
clicks) and look for others that could be better indications of preference.

Cheers,
Simon Reavely


On Wed, Jul 28, 2010 at 12:53 PM, Ted Dunning <te...@gmail.com> wrote:

> The position of users and items in the user x item occurrence matrix are
> interchangeable.  For every solution problem that matches items to users,
> there is a dual solution that matches users to items.
>
> In any case, log-likelihood methods in recommendation usually are applied
> to
> the item cooccurrence matrix to get an item-based recommendation algorithm.
>
> On Wed, Jul 28, 2010 at 9:49 AM, Tanton Gibbs <tanton.gibbs@gmail.com
> >wrote:
>
> > Very cool, I didn't realize it handled item similarity alongside user
> > similarity.
> >
>



-- 
Simon Reavely
simon.reavely@gmail.com

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Ted Dunning <te...@gmail.com>.
The position of users and items in the user x item occurrence matrix are
interchangeable.  For every solution problem that matches items to users,
there is a dual solution that matches users to items.

In any case, log-likelihood methods in recommendation usually are applied to
the item cooccurrence matrix to get an item-based recommendation algorithm.

On Wed, Jul 28, 2010 at 9:49 AM, Tanton Gibbs <ta...@gmail.com>wrote:

> Very cool, I didn't realize it handled item similarity alongside user
> similarity.
>

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Tanton Gibbs <ta...@gmail.com>.
On Wed, Jul 28, 2010 at 7:03 AM, Ted Dunning <te...@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 1:02 AM, Tanton Gibbs <ta...@gmail.com>wrote:
>
>> Another thing to consider in this same vein is that 1 or 2 clicks on a
>> resource may indicate a very strong preference (if the topic is
>> generally unpopular) or it may indicate a very weak preference (if the
>> topic is highly popular).  You should consider how other users are
>> interacting with this and other similar resources to help determine
>> satisfaction.
>>
>
> The recommendation system should handle this.  The log-likelihood similarity
> thing-thing is the one I would recommend starting with.

Very cool, I didn't realize it handled item similarity alongside user
similarity.

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Jul 28, 2010 at 1:02 AM, Tanton Gibbs <ta...@gmail.com>wrote:

> Another thing to consider in this same vein is that 1 or 2 clicks on a
> resource may indicate a very strong preference (if the topic is
> generally unpopular) or it may indicate a very weak preference (if the
> topic is highly popular).  You should consider how other users are
> interacting with this and other similar resources to help determine
> satisfaction.
>

The recommendation system should handle this.  The log-likelihood similarity
thing-thing is the one I would recommend starting with.


>
> I'll also echo Ted's comment that clicks are just a proxy for user
> satisfaction.  If you have a better way to measure satisfaction (such
> as time spent with the resource, further interaction, etc...) then you
> will end up with better recommendations.
>

I should emphasize again that this is a much stronger effect than most
people realize when I tell them that this is a good thing to do.  It can
easily make the difference between complete hash and extremely good
recommendations.  Picking the right action to analyze can easily make more
difference than any possible algorithm choice.

In addition, picking a good indicator of interest can easily decrease the
amount of clicks to analyze by up to an order of magnitude.  This is nice.

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Tanton Gibbs <ta...@gmail.com>.
> However one important lesson is that mapping associations to numbers
> in the 'wrong' way can significantly harm the result. For example, 1
> click is much much more significant than 0. 2 clicks is more
> significant than 1. But are 10 clicks 5 times stronger than 2?
> probably not. Maybe it's a favorite, but after "several" clicks the
> additional clicks mean little more.

Another thing to consider in this same vein is that 1 or 2 clicks on a
resource may indicate a very strong preference (if the topic is
generally unpopular) or it may indicate a very weak preference (if the
topic is highly popular).  You should consider how other users are
interacting with this and other similar resources to help determine
satisfaction.

I'll also echo Ted's comment that clicks are just a proxy for user
satisfaction.  If you have a better way to measure satisfaction (such
as time spent with the resource, further interaction, etc...) then you
will end up with better recommendations.

Tanton

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

Posted by Sean Owen <sr...@gmail.com>.
Yes, preferences are merely an indicator of the strength of an
association. They aren't necessarily from explicit ratings; you could
base this figure on click through counts.

You do not need to scale the values; the particular scale does not
matter to any algorithm.

However one important lesson is that mapping associations to numbers
in the 'wrong' way can significantly harm the result. For example, 1
click is much much more significant than 0. 2 clicks is more
significant than 1. But are 10 clicks 5 times stronger than 2?
probably not. Maybe it's a favorite, but after "several" clicks the
additional clicks mean little more.

So for instance I might begin your experiments by using the log of the
click count as the pref value.

And then there's other issues like normalizing, throwing out spurious
clicks and spam, etc.

Sean

On Tue, Jul 27, 2010 at 11:55 PM, Simon Reavely <si...@gmail.com> wrote:
> Hi,
>
> I am wondering what the best way is to implement recommendations based on
> click through rates. What I have:
> - user id
> - resource (i.e. item)
> - click through count for user on that resource
>
> I'm reading Mahout in Action MEAP right now (very good so far). Mahout seems
> to be very preference based (votings/ratings) but I know (reading Mahout in
> Action) that it also supports preference-less recommendations. However,
> since I have a click through count preference-less recommendation seems to
> be throwing away this click through data.
>
> I wondered if I can somehow convert click through count to a preference or
> if I should take another approach.
>
> Some ideas I had:
> - Just use the click through count as the preference (knowing that different
> users will have widely different counts).
> - Normalize the click count across users to say a 0-100 scale
> - ok, that's it...only two ideas so far!
>
> Any suggestions/patterns?
> Any warnings/anti-patterns?
>
> It seems like this should be a really common use-case for recommendations.
>
> Cheers,
> Simon
>
> --
> Simon Reavely
> simon.reavely@gmail.com
>