You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2008/10/01 13:18:08 UTC
Re: Recommending when working with binary data sets

On Tue, Sep 30, 2008 at 7:55 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> I think their algo then translates to:
>
> for item i1 in {1, 2, 3 ...100} {
>  for item i2 in {1, 10, 20} {
>    computeSimilarity(i1, i2)
>  }
> }

Yes though you would not consider recommending items 1, 10, or 20
since the user already has expressed a preference for them.

> sim (1, 1) = 1.0

(... and you therefore wouldn't have to compute this)

> sim (10, 1) = same as sim (1,10)

(... sim(x,x) isn't recorded in the similarity matrix, and sim(x,y) is
only recorded for x < y to avoid this redundancy, because we assume
it's symmetric. You know all this, just commenting.)


> So, then we have a list of most similarity items for each item:
> item 1: 20, 10
> item 10: 20, 1
> item 20: 10, 1

Yes those are most similar but of course they are, since those are the
very items the user has already expressed a preference for!

> Is this interpretation correct?  If it is, I don't quite yet see the benefit of computing similarity between items for a single person.  Of course they will be similar, unless we want to create groups of items for a single person with a diverse set of interests for some reason.

Not quite, what you are doing is computing an estimated preference for
everything the user hasn't already computed a preference for. And then
of course you recommend those with the highest estimated preference.

You do this by computing a weighted average. For every other item in
the universe (2, 3, 4, ...) compute its similarity to all the user's
preferred items (1, 10, 20), and sum those similarities times the
user's preferences for those items.

User-based recommenders work the same way but kinda turned on its side
-- you compute similiar *users* then...


> Wouldn't one need to build item-item similarity matrix for all Joe's items (1, 10, 20) and all new items (101-200)?

Yes

> If so, what's the point of figuring item-item similarity for all items Joe already consumed?

There isn't. :)

Well...

Let me digress a bit to perhaps explain why item-based recommenders
have some interesting power.

Often, your items are such that you have some a priori, external idea
of how similar they are -- beyond just a notion based on user
preferences. For example you might say that two mystery books are
similar and a mystery and a sci-fi book are not. This could form the
basis of an item-item similarity metric. This is good because:

1) it's more, external info being injected into the system, which
could improve recommendations, and
2) often it is quite cheap to compute this notion, to define the
complete item-item similarity matrix cheaply. It's not cheap to
compute the whole thing based on, say, Pearson correlations between
items based on preferences!

And interestingly this reasoning doesn't apply so much to user-based
recommenders. You usually don't have an a priori notion of user-user
similarity, and, these notions are likely to change as you learn more
about *users* whereas we expect *item* similarity to be relatively
stable the more we learn.

That's why item-based recommenders are not just the reverse of
user-based recommenders, but a bit interesting in their own right.
It's not 100% symmetric.

Not sure if that was useful....