You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Valentin Pletzer <pl...@gmail.com> on 2011/12/26 23:01:59 UTC

sampling bestseller buyers for recommendations

Hi,

I am trying to achieve some item-to-item-recommendations and the setup
works quite well. But one thing I stumbled across is that some items are so
popular that they are a recommendation for nearly every other item. In the
Amazon paper they say that they are sampling the bestseller buying
customers. Do I have to do this preprocessing step myself or does Mahout
help with that?

Thanks
Valentin

Re: sampling bestseller buyers for recommendations

Posted by Ted Dunning <te...@gmail.com>.
What Sean suggests is important.  You may need to add down-sampling to the
mix as well, but that is usually only necessary for speed, not quality of
recommendations.

On Mon, Dec 26, 2011 at 2:07 PM, Sean Owen <sr...@gmail.com> wrote:

> What item similarity metric are you using? Log-likelihood tends to
> account for an item's baseline popularity and normalize it away. So a
> best-seller isn't similar to an item just because it's a best-seller
> and shows up a lot, but because it shows up an unusually large number
> of times, even granting it's a best seller. Try that if you're not
> already using it.
>
> On Mon, Dec 26, 2011 at 4:01 PM, Valentin Pletzer <pl...@gmail.com>
> wrote:
> > Hi,
> >
> > I am trying to achieve some item-to-item-recommendations and the setup
> > works quite well. But one thing I stumbled across is that some items are
> so
> > popular that they are a recommendation for nearly every other item. In
> the
> > Amazon paper they say that they are sampling the bestseller buying
> > customers. Do I have to do this preprocessing step myself or does Mahout
> > help with that?
> >
> > Thanks
> > Valentin
>

Re: sampling bestseller buyers for recommendations

Posted by Valentin Pletzer <pl...@gmail.com>.
Thank you and Sean for your immediate advice :) I very much appreciate it.

On Mon, Dec 26, 2011 at 11:25 PM, Ted Dunning <te...@gmail.com> wrote:

> Log-likelihood is very much like PMI (but better).
>
> This is a general recommendation problem, but should not be a problem after
> using the log-likelihood ratio.  It is easy to show that any item that
> cooccurs with everything will have zero score with LLR.
>
> It may also be possible that these common items are prevalent in distinct
> sub-populations.  In that case, you may actually have some strong signal
> there.  In that case, down-sampling common items and downsampling prolific
> consumers is very much a good idea.
>
> Downsampling is better in most cases than reweighting because it has pretty
> much the same effect but makes things run much faster as well.  You might
> as well get both benefits at once.
>
> On Mon, Dec 26, 2011 at 2:20 PM, Valentin Pletzer <pl...@gmail.com>
> wrote:
>
> > I am already using Log-likelihood. But since the items are free downloads
> > some items tend to cooccur very often with nearly every other item. So
> > maybe my problem isnt a mahout problem but a more generell recommendation
> > problem?
> >
> > I am thinking about some dampening factor for very popular items or
> > something similar to PMI (
> > http://en.wikipedia.org/wiki/Pointwise_mutual_information)
> >
> > On Mon, Dec 26, 2011 at 11:07 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> > > What item similarity metric are you using? Log-likelihood tends to
> > > account for an item's baseline popularity and normalize it away. So a
> > > best-seller isn't similar to an item just because it's a best-seller
> > > and shows up a lot, but because it shows up an unusually large number
> > > of times, even granting it's a best seller. Try that if you're not
> > > already using it.
> > >
> > > On Mon, Dec 26, 2011 at 4:01 PM, Valentin Pletzer <pl...@gmail.com>
> > > wrote:
> > > > Hi,
> > > >
> > > > I am trying to achieve some item-to-item-recommendations and the
> setup
> > > > works quite well. But one thing I stumbled across is that some items
> > are
> > > so
> > > > popular that they are a recommendation for nearly every other item.
> In
> > > the
> > > > Amazon paper they say that they are sampling the bestseller buying
> > > > customers. Do I have to do this preprocessing step myself or does
> > Mahout
> > > > help with that?
> > > >
> > > > Thanks
> > > > Valentin
> > >
> >
>

Re: sampling bestseller buyers for recommendations

Posted by Ted Dunning <te...@gmail.com>.
Log-likelihood is very much like PMI (but better).

This is a general recommendation problem, but should not be a problem after
using the log-likelihood ratio.  It is easy to show that any item that
cooccurs with everything will have zero score with LLR.

It may also be possible that these common items are prevalent in distinct
sub-populations.  In that case, you may actually have some strong signal
there.  In that case, down-sampling common items and downsampling prolific
consumers is very much a good idea.

Downsampling is better in most cases than reweighting because it has pretty
much the same effect but makes things run much faster as well.  You might
as well get both benefits at once.

On Mon, Dec 26, 2011 at 2:20 PM, Valentin Pletzer <pl...@gmail.com> wrote:

> I am already using Log-likelihood. But since the items are free downloads
> some items tend to cooccur very often with nearly every other item. So
> maybe my problem isnt a mahout problem but a more generell recommendation
> problem?
>
> I am thinking about some dampening factor for very popular items or
> something similar to PMI (
> http://en.wikipedia.org/wiki/Pointwise_mutual_information)
>
> On Mon, Dec 26, 2011 at 11:07 PM, Sean Owen <sr...@gmail.com> wrote:
>
> > What item similarity metric are you using? Log-likelihood tends to
> > account for an item's baseline popularity and normalize it away. So a
> > best-seller isn't similar to an item just because it's a best-seller
> > and shows up a lot, but because it shows up an unusually large number
> > of times, even granting it's a best seller. Try that if you're not
> > already using it.
> >
> > On Mon, Dec 26, 2011 at 4:01 PM, Valentin Pletzer <pl...@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > I am trying to achieve some item-to-item-recommendations and the setup
> > > works quite well. But one thing I stumbled across is that some items
> are
> > so
> > > popular that they are a recommendation for nearly every other item. In
> > the
> > > Amazon paper they say that they are sampling the bestseller buying
> > > customers. Do I have to do this preprocessing step myself or does
> Mahout
> > > help with that?
> > >
> > > Thanks
> > > Valentin
> >
>

Re: sampling bestseller buyers for recommendations

Posted by Valentin Pletzer <pl...@gmail.com>.
I am already using Log-likelihood. But since the items are free downloads
some items tend to cooccur very often with nearly every other item. So
maybe my problem isnt a mahout problem but a more generell recommendation
problem?

I am thinking about some dampening factor for very popular items or
something similar to PMI (
http://en.wikipedia.org/wiki/Pointwise_mutual_information)

On Mon, Dec 26, 2011 at 11:07 PM, Sean Owen <sr...@gmail.com> wrote:

> What item similarity metric are you using? Log-likelihood tends to
> account for an item's baseline popularity and normalize it away. So a
> best-seller isn't similar to an item just because it's a best-seller
> and shows up a lot, but because it shows up an unusually large number
> of times, even granting it's a best seller. Try that if you're not
> already using it.
>
> On Mon, Dec 26, 2011 at 4:01 PM, Valentin Pletzer <pl...@gmail.com>
> wrote:
> > Hi,
> >
> > I am trying to achieve some item-to-item-recommendations and the setup
> > works quite well. But one thing I stumbled across is that some items are
> so
> > popular that they are a recommendation for nearly every other item. In
> the
> > Amazon paper they say that they are sampling the bestseller buying
> > customers. Do I have to do this preprocessing step myself or does Mahout
> > help with that?
> >
> > Thanks
> > Valentin
>

Re: sampling bestseller buyers for recommendations

Posted by Sean Owen <sr...@gmail.com>.
What item similarity metric are you using? Log-likelihood tends to
account for an item's baseline popularity and normalize it away. So a
best-seller isn't similar to an item just because it's a best-seller
and shows up a lot, but because it shows up an unusually large number
of times, even granting it's a best seller. Try that if you're not
already using it.

On Mon, Dec 26, 2011 at 4:01 PM, Valentin Pletzer <pl...@gmail.com> wrote:
> Hi,
>
> I am trying to achieve some item-to-item-recommendations and the setup
> works quite well. But one thing I stumbled across is that some items are so
> popular that they are a recommendation for nearly every other item. In the
> Amazon paper they say that they are sampling the bestseller buying
> customers. Do I have to do this preprocessing step myself or does Mahout
> help with that?
>
> Thanks
> Valentin