You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2010/01/26 22:44:33 UTC

What is content based recommendation, to you

I want to knock down some support for content based recommendation.
And I want to solicit ideas about what this even means to its intended
audience -- users.

I define it broadly as a recommender in which:
- items have attributes (e.g. books have genres, titles, authors)
rather than being completely opaque entities
- users have affinities for attributes
- users are recommended items with attributes they like

I would narrow and specify this, in the context of Mahout, to have a
collaborative filtering angle:
- items have attributes, still
- users have preferences for items (classic CF)
- (therefore, users implicitly have affinities for attributes)
- item similarity can be defined in terms of item attributes, in some way
- users are recommended items that are similar to other items they
like (item-based recommendation)
- (therefore, users are recommended items with attributes they like)

This is my spin on content based recommendation in Mahout. I define it
as a special case of item-based recommendation. Thoughts?

So, the idea is to provide some non-trivial framework for supporting
item attributes, and defining similarity in terms of attributes.
Thoughts on what that should look like?

Sean

Re: What is content based recommendation, to you

Posted by Hector Yee <he...@gmail.com>.

I think  Sean's post is to use item attributes but nothing prevents you
from using user attributes too
On Jun 7, 2012 11:43 PM, "yswapna@gmail.com" <ys...@gmail.com> wrote:

Re: What is content based recommendation, to you

Posted by "yswapna@gmail.com" <ys...@gmail.com>.

Has anything been done in this regards. That is build Content based
recommender with respect to other users as well?

--
View this message in context: http://lucene.472066.n3.nabble.com/What-is-content-based-recommendation-to-you-tp639024p3988368.html
Sent from the Mahout Developer List mailing list archive at Nabble.com.

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

Yes, nice, I follow.

>From a CF perspective, item-level cooccurrence results in the same
thing as user-level cooccurrence of attributes. If I know what items
users like and how much, and know how much items resemble each other
based on attributes, I am basically predicting what the user will like
based on item attributes -- user X will item A because it shares
attributes with item B, and X likes B, and so we can infer X likes B's
attributes... etc.

As you can see, I'm eager to fit this into the canonical CF framework
without cheating the meaning of "content-based" recommender. It's not
(just) laziness, but, would certainly be tidy to fit this idea into
the existing framework meaningfully rather than bolt on another
paradigm. I guess I feel one role of a framework like Mahout is to
tease out and capture the similarity and order in these diverse ideas,
rather than just implement each one by one.

On Wed, Jan 27, 2010 at 1:22 AM, Ted Dunning <te...@gmail.com> wrote:
> Most decomposition algorithms have trouble when presented with more than one
> kind of cooccurrence such as this presents.  My guess is that you would get
> most of the available mileage by ignoring item level cooccurrence and
> focusing on user level attribute cooccurrence.  This makes decomposition
> easy and presumably gives you the best of all worlds since item cooccurrence
> is a special case of user cooccurrence.  Decomposition approaches are nice
> as well since they would use artist when it helps and ignore it when it
> doesn't (to use the music case again).

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

Yes, nice, I follow.

>From a CF perspective, item-level cooccurrence results in the same
thing as user-level cooccurrence of attributes. If I know what items
users like and how much, and know how much items resemble each other
based on attributes, I am basically predicting what the user will like
based on item attributes -- user X will item A because it shares
attributes with item B, and X likes B, and so we can infer X likes B's
attributes... etc.

As you can see, I'm eager to fit this into the canonical CF framework
without cheating the meaning of "content-based" recommender. It's not
(just) laziness, but, would certainly be tidy to fit this idea into
the existing framework meaningfully rather than bolt on another
paradigm. I guess I feel one role of a framework like Mahout is to
tease out and capture the similarity and order in these diverse ideas,
rather than just implement each one by one.

On Wed, Jan 27, 2010 at 1:22 AM, Ted Dunning <te...@gmail.com> wrote:
> Most decomposition algorithms have trouble when presented with more than one
> kind of cooccurrence such as this presents.  My guess is that you would get
> most of the available mileage by ignoring item level cooccurrence and
> focusing on user level attribute cooccurrence.  This makes decomposition
> easy and presumably gives you the best of all worlds since item cooccurrence
> is a special case of user cooccurrence.  Decomposition approaches are nice
> as well since they would use artist when it helps and ignore it when it
> doesn't (to use the music case again).

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

On Tue, Jan 26, 2010 at 5:04 PM, Sean Owen <sr...@gmail.com> wrote:

> You're saying content-based recommendation, in practice, is often a
> matter of substituting one dominant item attribute in place of items
> -- recommending on artist, rather than artist track. OK, check, one
> can do that in the current framework by using artists as items. So I
> think that's supported for free.
>

I think so as well.

>
> And maybe my other notion of a way to bring content-based
> recommendation into the framework -- some organized framework for
> constructing and tuning a notion of item similarity based on
> attributes -- also has merit and belongs in the category of
> "content-based" techniques.
>

I didn't mention that there is quite a bit of scope here for decomposition
based algorithms.  There is no reason at all for all the attributes of an
item to not contribute to the "meaning" of that item.

The problem there really comes from the fact that attributes cohere in two
ways.  One way is by cooccurring on a single item.  That is definitely
semantically important and has implications for recommendation performance
because it helps us understand items themselves in a better and less sparse
way.  Another way is by cooccurring within the set of preferences for a
single user.  That is also important since it indicates that something about
those attributes is important relative to user preferences.

Most decomposition algorithms have trouble when presented with more than one
kind of cooccurrence such as this presents.  My guess is that you would get
most of the available mileage by ignoring item level cooccurrence and
focusing on user level attribute cooccurrence.  This makes decomposition
easy and presumably gives you the best of all worlds since item cooccurrence
is a special case of user cooccurrence.  Decomposition approaches are nice
as well since they would use artist when it helps and ignore it when it
doesn't (to use the music case again).

Anyway, the great and glorious advantage of decompositional techniques here
is that they will embed items in a semantic space based on all available
information.  That provides a very natural way to integrate all attributes
for recommendation.

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

On Tue, Jan 26, 2010 at 5:04 PM, Sean Owen <sr...@gmail.com> wrote:

> You're saying content-based recommendation, in practice, is often a
> matter of substituting one dominant item attribute in place of items
> -- recommending on artist, rather than artist track. OK, check, one
> can do that in the current framework by using artists as items. So I
> think that's supported for free.
>

I think so as well.

>
> And maybe my other notion of a way to bring content-based
> recommendation into the framework -- some organized framework for
> constructing and tuning a notion of item similarity based on
> attributes -- also has merit and belongs in the category of
> "content-based" techniques.
>

I didn't mention that there is quite a bit of scope here for decomposition
based algorithms.  There is no reason at all for all the attributes of an
item to not contribute to the "meaning" of that item.

The problem there really comes from the fact that attributes cohere in two
ways.  One way is by cooccurring on a single item.  That is definitely
semantically important and has implications for recommendation performance
because it helps us understand items themselves in a better and less sparse
way.  Another way is by cooccurring within the set of preferences for a
single user.  That is also important since it indicates that something about
those attributes is important relative to user preferences.

Most decomposition algorithms have trouble when presented with more than one
kind of cooccurrence such as this presents.  My guess is that you would get
most of the available mileage by ignoring item level cooccurrence and
focusing on user level attribute cooccurrence.  This makes decomposition
easy and presumably gives you the best of all worlds since item cooccurrence
is a special case of user cooccurrence.  Decomposition approaches are nice
as well since they would use artist when it helps and ignore it when it
doesn't (to use the music case again).

Anyway, the great and glorious advantage of decompositional techniques here
is that they will embed items in a semantic space based on all available
information.  That provides a very natural way to integrate all attributes
for recommendation.

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

Nice, good wisdom here.


I agree about the appeal and problems of thinking of item-attribute
pairs as your items.

You're saying content-based recommendation, in practice, is often a
matter of substituting one dominant item attribute in place of items
-- recommending on artist, rather than artist track. OK, check, one
can do that in the current framework by using artists as items. So I
think that's supported for free.

And maybe my other notion of a way to bring content-based
recommendation into the framework -- some organized framework for
constructing and tuning a notion of item similarity based on
attributes -- also has merit and belongs in the category of
"content-based" techniques.


I ask because there's been some request to talk more about
content-based recommendation and so I want to build this out more.


On Tue, Jan 26, 2010 at 11:36 PM, Ted Dunning <te...@gmail.com> wrote:
> I define it a bit differently by redefining recommendations as machine
> learning.
>
> Users have preferences for objects with attributes.
>
> We would like to learn from all user/object/attribute preference data to
> predict so-far unobserved preferences of a user for other objects.
>
> Normal recommendations is a subset of this where there is exactly one id
> attribute for every object.
>
> We can extend most recommendation algorithms to this new paradigm relatively
> transparently by considering each expressed item preference to be a bundle
> of attribute preferences.  Our recommendation algorithm needs to produce a
> list of recommended attributes which we integrate into a list of recommended
> items.  The list of recommended attributes might be segregated into a list
> of values for each kind of attribute or it might be in a single list.  The
> segregated approach could just replicate a recommendation engine per
> attribute type.  The combined approach might just label all attributes and
> throw them into a soup of preference data.
>
> The additional code needed consists mostly of writing the code that
> integrates the attribute recommendations into a list of item
> recommendations.  This can be as simple as weighting the recommended
> attributes by rank and doing rankScore * idf retrieval to find the items.
> Some algorithms like LDA have the ability to explicitly integrate the
> different kinds of attributes.  Others really don't.
>
> One problem with this is that you are exploding the number of preferences
> which can present scaling and noise problems.  You also inherently
> intermingle attributes with very different distributional characteristics
> together.  For instance, there might only be a dozen or so colors of shoes
> and thus the number of people who have expressed a preference for some kind
> of red shoe is going to be vastly larger than the number of people who have
> expressed a preference for a specific color of a specific size of a specific
> model of a shoe.  It is common for recommendation systems to fail for very
> common things or for very rare things and integrating both pathological
> situations in a single recommendation framework may be a problem.
>
> My own experience with this is that it is common for one kind of attribute
> to dominates the recommendation process in the sense of providing the most
> oomph and accuracy.  This can be because the data is sparse and some
> attribute provide useful smoothing or it can be that some attributes are too
> general and other attributes provide more precision.  At Musicmatch, for
> instance, the artist attribute provided a disproportionate share of music
> recommendation value above track or album or even song (track != song
> because it is common for the same song to be on many albums giving many
> tracks).  I think that this must only be true to first order and that if you
> dig in, you would find minority classes where different attributes provide
> different amounts of data, but it is rare in startups to get past the first
> order solution.

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

Yes that's true. What we have now is a mostly pure CF framework, but,
as you say, with some creative application it does something like
content-based recommendation.

I'd like to emphasize this angle in the writeup, while also admitting
that there is more one could do with content-based approaches than
this.

On Sat, Jan 30, 2010 at 5:50 PM, Ted Dunning <te...@gmail.com> wrote:
> I think that with a slight bit of creative rewriting of results you can
> probably do some pretty fancy content based work with the current software.
>
> Take the music example again.  Take items as artists *or* albums *or*
> tracks.  Explode the track listening history of a user into a mixed list of
> artists, albums and tracks.  Recommend to users as usual to get a mixed list
> of different kinds of items.  You might stop there and just display a
> heterogeneous list of things, but you could also slide through the list and
> replace artists with a popularity ranked list of their tracks, albums with
> something similar and then reduce duplicates, boosting items that get
> multiple credit.  If you claim that the duplicate reduction is part of the
> presentation layer, then Taste as it stands can probably do fairly involved
> content based recommendations.

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

Yes that's true. What we have now is a mostly pure CF framework, but,
as you say, with some creative application it does something like
content-based recommendation.

I'd like to emphasize this angle in the writeup, while also admitting
that there is more one could do with content-based approaches than
this.

On Sat, Jan 30, 2010 at 5:50 PM, Ted Dunning <te...@gmail.com> wrote:
> I think that with a slight bit of creative rewriting of results you can
> probably do some pretty fancy content based work with the current software.
>
> Take the music example again.  Take items as artists *or* albums *or*
> tracks.  Explode the track listening history of a user into a mixed list of
> artists, albums and tracks.  Recommend to users as usual to get a mixed list
> of different kinds of items.  You might stop there and just display a
> heterogeneous list of things, but you could also slide through the list and
> replace artists with a popularity ranked list of their tracks, albums with
> something similar and then reduce duplicates, boosting items that get
> multiple credit.  If you claim that the duplicate reduction is part of the
> presentation layer, then Taste as it stands can probably do fairly involved
> content based recommendations.

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

I think that with a slight bit of creative rewriting of results you can
probably do some pretty fancy content based work with the current software.

Take the music example again.  Take items as artists *or* albums *or*
tracks.  Explode the track listening history of a user into a mixed list of
artists, albums and tracks.  Recommend to users as usual to get a mixed list
of different kinds of items.  You might stop there and just display a
heterogeneous list of things, but you could also slide through the list and
replace artists with a popularity ranked list of their tracks, albums with
something similar and then reduce duplicates, boosting items that get
multiple credit.  If you claim that the duplicate reduction is part of the
presentation layer, then Taste as it stands can probably do fairly involved
content based recommendations.

On Sat, Jan 30, 2010 at 8:59 AM, Sean Owen <sr...@gmail.com> wrote:

> 1) if your items are really dominated by one attribute (e.g.
> recommending songs based on artist) then by thinking of that attribute
> as the 'item' and applying regular CF, you're doing content-based
> recommendation
> 2) if you want to base item-item similarity on attributes and pair
> that with item-based CF, you're doing content-based recommendation
>

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

I think that with a slight bit of creative rewriting of results you can
probably do some pretty fancy content based work with the current software.

Take the music example again.  Take items as artists *or* albums *or*
tracks.  Explode the track listening history of a user into a mixed list of
artists, albums and tracks.  Recommend to users as usual to get a mixed list
of different kinds of items.  You might stop there and just display a
heterogeneous list of things, but you could also slide through the list and
replace artists with a popularity ranked list of their tracks, albums with
something similar and then reduce duplicates, boosting items that get
multiple credit.  If you claim that the duplicate reduction is part of the
presentation layer, then Taste as it stands can probably do fairly involved
content based recommendations.

On Sat, Jan 30, 2010 at 8:59 AM, Sean Owen <sr...@gmail.com> wrote:

> 1) if your items are really dominated by one attribute (e.g.
> recommending songs based on artist) then by thinking of that attribute
> as the 'item' and applying regular CF, you're doing content-based
> recommendation
> 2) if you want to base item-item similarity on attributes and pair
> that with item-based CF, you're doing content-based recommendation
>

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

To summarize --

For purposes of the book, I'm going to have to talk about
content-based, but can't really say there's support for it since there
isn't. That I think is just fine; better to write this up in 2 pages
than be silent.

Separately there absolutely no reason to exclude content-based
techniques, no. It's just a question of where the pieces are on the
to-do list. Some forms of content-based recommendation already work in
the framework:

1) if your items are really dominated by one attribute (e.g.
recommending songs based on artist) then by thinking of that attribute
as the 'item' and applying regular CF, you're doing content-based
recommendation
2) if you want to base item-item similarity on attributes and pair
that with item-based CF, you're doing content-based recommendation

and then there are things that probably should be there but aren't yet:

3) something that extract user->attribute and attribute->item
associations based on user->item associations, and does something like
CF based on it

and then everything else is, to me, a question mark for later.

Thanks for the good discussion. It's clarified and enhanced my
thinking and will let me write a good couple pages in the current
draft.

Sean

On Thu, Jan 28, 2010 at 2:29 AM, Jake Mannix <ja...@gmail.com> wrote:
> On Wed, Jan 27, 2010 at 6:18 PM, Ted Dunning <te...@gmail.com> wrote:
>>
>>
>> a) whether items that are retrieved/recommended are opaque or have
>> attributes (is the process content-based?)
>>
>
> Well this is the part we all are agreeing on - we're all talking about the
> "new"
> (to our framework) technique of including content attributes.  Good here.
>
>
>> b) whether the basis for retrieving/recommending items is an explicit query
>> (of whatever form) or is an implicit query formed by the user's historical
>> actions (is this search or recommendation?)
>>
>
> I'm *not* suggesting we consider explicitly chose attributes that the user
> enters on their own.  That is completely a search, not a recommendation,
> and isn't what the recommendation part of Mahout is about.  I'm talking
> about queries generated in some way from the content of the thing which
> wants to have recommendations given to it (the webpage which wants ads,
> the job posting which wants applicants, the user who has a profile who
> wants XYZ, etc...).  Definitely implicit, in terms of what users *do*, but
> possibly fairly explicit in terms of what the *are*.
>
>
>> c) whether the retrieval/recommendation of items uses the behavior of all
>> users to sharpen the results (is this a social algorithm or not?)
>>
>
> Right, this is the part which makes it a CF-based approach, and I would
> like us to not get caught up in this being the focal point of a
> recommendation
> system.
>
>
>> d) what do we call the system (recommendation, collaborative filtering,
>> search or whatever)
>>
>
> I think we're all cool with calling it a recommendation if it's 1) not
> specifically
> user-driven (your point b. above), yet results of some type are delivered.
>
>  -jake
>

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

To summarize --

For purposes of the book, I'm going to have to talk about
content-based, but can't really say there's support for it since there
isn't. That I think is just fine; better to write this up in 2 pages
than be silent.

Separately there absolutely no reason to exclude content-based
techniques, no. It's just a question of where the pieces are on the
to-do list. Some forms of content-based recommendation already work in
the framework:

1) if your items are really dominated by one attribute (e.g.
recommending songs based on artist) then by thinking of that attribute
as the 'item' and applying regular CF, you're doing content-based
recommendation
2) if you want to base item-item similarity on attributes and pair
that with item-based CF, you're doing content-based recommendation

and then there are things that probably should be there but aren't yet:

3) something that extract user->attribute and attribute->item
associations based on user->item associations, and does something like
CF based on it

and then everything else is, to me, a question mark for later.

Thanks for the good discussion. It's clarified and enhanced my
thinking and will let me write a good couple pages in the current
draft.

Sean

On Thu, Jan 28, 2010 at 2:29 AM, Jake Mannix <ja...@gmail.com> wrote:
> On Wed, Jan 27, 2010 at 6:18 PM, Ted Dunning <te...@gmail.com> wrote:
>>
>>
>> a) whether items that are retrieved/recommended are opaque or have
>> attributes (is the process content-based?)
>>
>
> Well this is the part we all are agreeing on - we're all talking about the
> "new"
> (to our framework) technique of including content attributes.  Good here.
>
>
>> b) whether the basis for retrieving/recommending items is an explicit query
>> (of whatever form) or is an implicit query formed by the user's historical
>> actions (is this search or recommendation?)
>>
>
> I'm *not* suggesting we consider explicitly chose attributes that the user
> enters on their own.  That is completely a search, not a recommendation,
> and isn't what the recommendation part of Mahout is about.  I'm talking
> about queries generated in some way from the content of the thing which
> wants to have recommendations given to it (the webpage which wants ads,
> the job posting which wants applicants, the user who has a profile who
> wants XYZ, etc...).  Definitely implicit, in terms of what users *do*, but
> possibly fairly explicit in terms of what the *are*.
>
>
>> c) whether the retrieval/recommendation of items uses the behavior of all
>> users to sharpen the results (is this a social algorithm or not?)
>>
>
> Right, this is the part which makes it a CF-based approach, and I would
> like us to not get caught up in this being the focal point of a
> recommendation
> system.
>
>
>> d) what do we call the system (recommendation, collaborative filtering,
>> search or whatever)
>>
>
> I think we're all cool with calling it a recommendation if it's 1) not
> specifically
> user-driven (your point b. above), yet results of some type are delivered.
>
>  -jake
>

Re: What is content based recommendation, to you

Posted by Jake Mannix <ja...@gmail.com>.

On Wed, Jan 27, 2010 at 6:18 PM, Ted Dunning <te...@gmail.com> wrote:
>
>
> a) whether items that are retrieved/recommended are opaque or have
> attributes (is the process content-based?)
>

Well this is the part we all are agreeing on - we're all talking about the
"new"
(to our framework) technique of including content attributes.  Good here.

> b) whether the basis for retrieving/recommending items is an explicit query
> (of whatever form) or is an implicit query formed by the user's historical
> actions (is this search or recommendation?)
>

I'm *not* suggesting we consider explicitly chose attributes that the user
enters on their own.  That is completely a search, not a recommendation,
and isn't what the recommendation part of Mahout is about.  I'm talking
about queries generated in some way from the content of the thing which
wants to have recommendations given to it (the webpage which wants ads,
the job posting which wants applicants, the user who has a profile who
wants XYZ, etc...).  Definitely implicit, in terms of what users *do*, but
possibly fairly explicit in terms of what the *are*.

> c) whether the retrieval/recommendation of items uses the behavior of all
> users to sharpen the results (is this a social algorithm or not?)
>

Right, this is the part which makes it a CF-based approach, and I would
like us to not get caught up in this being the focal point of a
recommendation
system.

> d) what do we call the system (recommendation, collaborative filtering,
> search or whatever)
>

I think we're all cool with calling it a recommendation if it's 1) not
specifically
user-driven (your point b. above), yet results of some type are delivered.

  -jake

Re: What is content based recommendation, to you

Posted by Jake Mannix <ja...@gmail.com>.

On Wed, Jan 27, 2010 at 6:18 PM, Ted Dunning <te...@gmail.com> wrote:
>
>
> a) whether items that are retrieved/recommended are opaque or have
> attributes (is the process content-based?)
>

Well this is the part we all are agreeing on - we're all talking about the
"new"
(to our framework) technique of including content attributes.  Good here.

> b) whether the basis for retrieving/recommending items is an explicit query
> (of whatever form) or is an implicit query formed by the user's historical
> actions (is this search or recommendation?)
>

I'm *not* suggesting we consider explicitly chose attributes that the user
enters on their own.  That is completely a search, not a recommendation,
and isn't what the recommendation part of Mahout is about.  I'm talking
about queries generated in some way from the content of the thing which
wants to have recommendations given to it (the webpage which wants ads,
the job posting which wants applicants, the user who has a profile who
wants XYZ, etc...).  Definitely implicit, in terms of what users *do*, but
possibly fairly explicit in terms of what the *are*.

> c) whether the retrieval/recommendation of items uses the behavior of all
> users to sharpen the results (is this a social algorithm or not?)
>

Right, this is the part which makes it a CF-based approach, and I would
like us to not get caught up in this being the focal point of a
recommendation
system.

> d) what do we call the system (recommendation, collaborative filtering,
> search or whatever)
>

I think we're all cool with calling it a recommendation if it's 1) not
specifically
user-driven (your point b. above), yet results of some type are delivered.

  -jake

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

I think that the problem with this conversation and its not quite direct
matching is that we have several nearly independent characteristics.  As I
see it, these include:

a) whether items that are retrieved/recommended are opaque or have
attributes (is the process content-based?)

b) whether the basis for retrieving/recommending items is an explicit query
(of whatever form) or is an implicit query formed by the user's historical
actions (is this search or recommendation?)

c) whether the retrieval/recommendation of items uses the behavior of all
users to sharpen the results (is this a social algorithm or not?)

d) what do we call the system (recommendation, collaborative filtering,
search or whatever)

These qualities are relatively independent and factoring them seems useful
to me.  Whether the user input is words typed, videos clicked or ratings
made seems much less important to me.

On Wed, Jan 27, 2010 at 3:21 PM, Jake Mannix <ja...@gmail.com> wrote:

> I guess another way that I think of it is that CF is actually a very
> special
> case of recommendation: you have generic items and users, and withou
> knowing
> anything about the content of the items (or items), you can use ratings to
> predict unknown preferences.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

I think that the problem with this conversation and its not quite direct
matching is that we have several nearly independent characteristics.  As I
see it, these include:

a) whether items that are retrieved/recommended are opaque or have
attributes (is the process content-based?)

b) whether the basis for retrieving/recommending items is an explicit query
(of whatever form) or is an implicit query formed by the user's historical
actions (is this search or recommendation?)

c) whether the retrieval/recommendation of items uses the behavior of all
users to sharpen the results (is this a social algorithm or not?)

d) what do we call the system (recommendation, collaborative filtering,
search or whatever)

These qualities are relatively independent and factoring them seems useful
to me.  Whether the user input is words typed, videos clicked or ratings
made seems much less important to me.

On Wed, Jan 27, 2010 at 3:21 PM, Jake Mannix <ja...@gmail.com> wrote:

> I guess another way that I think of it is that CF is actually a very
> special
> case of recommendation: you have generic items and users, and withou
> knowing
> anything about the content of the items (or items), you can use ratings to
> predict unknown preferences.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Jake Mannix <ja...@gmail.com>.

I guess another way that I think of it is that CF is actually a very special
case of recommendation: you have generic items and users, and withou knowing
anything about the content of the items (or items), you can use ratings to
predict unknown preferences.

The general case is that you have users, items, and you DO know something
about the attributes of the items and users.  Then you could try to do
"untrained recommendation" ie search, but better is to use explicit ratings
to do feature selection and feature weighting.

I'm not sure how that fits best with Taste, but that's the hierarchy of
recommenders I see...

  -jake

On Jan 27, 2010 2:51 PM, "Jake Mannix" <ja...@gmail.com> wrote:

On Wed, Jan 27, 2010 at 2:27 PM, Sean Owen <sr...@gmail.com> wrote: > > On
Wed, Jan 27, 2010 at 2:1...
But how is "presence of term X in both item1 and user1" as a boolean
preference value any different than "user1 has a preference for
attribute(X)"?  Similarly, tf-idf weightings provide a floating point
"rating" for correlations between different item types.

The reason why I think this kind of recommender is not so strange is
that you can group together attributes into fields / column-families,
and while presence/absence (or tf-idf, or whatever) can act as
raw ratings, you can then add in arbitrary model weights *between*
fields which are *learned* by feedback (use logistic regression,
for example) from the user-item ratings table.  Does that make sense?

  > > So I suppose I am resisting implementing this as a recommender system
> since it's well in ha...
It *exists* as a search setup, but at least in e.g Lucene, it's not designed
to do this, really, and there are lots of hacks you have to do (the
normalization
is wrong, the dot product isn't really cosine, you have to work to make it
into tanimoto/etc).  And search setups aren't really designed to do batch
recommendations of this kind either.  Trust me, you can do this with search,
and sometimes its a good idea, but it's kindof a kludge, and it's not at all
straightforward (but the goal is a totally valid one!).

  > > > > >  * on webpage (type W), you have certain set of features, and
users come to > > that > ...
But what you're suggesting here is one particular choice of solution - it's
presupposing that that one is the best.  Why not say: similarity(W,A) =
alpha_0 * (W_title * A_title) + alpha_1 * (W_header * A_title) + alpha_2 *
(W_subHeader * A_body) + alpha_3 * (W_tags * A_landingURL) + ...
and then train your alpha_i to optimize clickthrough?

  > > Well there's no reason that a recommender framework shouldn't support
> search-like approach...

Why should we limit ourselves to just a CF framework?  Why not a
recommendation framework which can easily do both?

  -jake

Re: What is content based recommendation, to you

Posted by Jake Mannix <ja...@gmail.com>.

I guess another way that I think of it is that CF is actually a very special
case of recommendation: you have generic items and users, and withou knowing
anything about the content of the items (or items), you can use ratings to
predict unknown preferences.

The general case is that you have users, items, and you DO know something
about the attributes of the items and users.  Then you could try to do
"untrained recommendation" ie search, but better is to use explicit ratings
to do feature selection and feature weighting.

I'm not sure how that fits best with Taste, but that's the hierarchy of
recommenders I see...

  -jake

On Jan 27, 2010 2:51 PM, "Jake Mannix" <ja...@gmail.com> wrote:

On Wed, Jan 27, 2010 at 2:27 PM, Sean Owen <sr...@gmail.com> wrote: > > On
Wed, Jan 27, 2010 at 2:1...
But how is "presence of term X in both item1 and user1" as a boolean
preference value any different than "user1 has a preference for
attribute(X)"?  Similarly, tf-idf weightings provide a floating point
"rating" for correlations between different item types.

The reason why I think this kind of recommender is not so strange is
that you can group together attributes into fields / column-families,
and while presence/absence (or tf-idf, or whatever) can act as
raw ratings, you can then add in arbitrary model weights *between*
fields which are *learned* by feedback (use logistic regression,
for example) from the user-item ratings table.  Does that make sense?

  > > So I suppose I am resisting implementing this as a recommender system
> since it's well in ha...
It *exists* as a search setup, but at least in e.g Lucene, it's not designed
to do this, really, and there are lots of hacks you have to do (the
normalization
is wrong, the dot product isn't really cosine, you have to work to make it
into tanimoto/etc).  And search setups aren't really designed to do batch
recommendations of this kind either.  Trust me, you can do this with search,
and sometimes its a good idea, but it's kindof a kludge, and it's not at all
straightforward (but the goal is a totally valid one!).

  > > > > >  * on webpage (type W), you have certain set of features, and
users come to > > that > ...
But what you're suggesting here is one particular choice of solution - it's
presupposing that that one is the best.  Why not say: similarity(W,A) =
alpha_0 * (W_title * A_title) + alpha_1 * (W_header * A_title) + alpha_2 *
(W_subHeader * A_body) + alpha_3 * (W_tags * A_landingURL) + ...
and then train your alpha_i to optimize clickthrough?

  > > Well there's no reason that a recommender framework shouldn't support
> search-like approach...

Why should we limit ourselves to just a CF framework?  Why not a
recommendation framework which can easily do both?

  -jake

Re: What is content based recommendation, to you

Posted by Jake Mannix <ja...@gmail.com>.

On Wed, Jan 27, 2010 at 2:27 PM, Sean Owen <sr...@gmail.com> wrote:

> On Wed, Jan 27, 2010 at 2:15 AM, Jake Mannix <ja...@gmail.com>
> wrote:
> > There is no need (although there may be much *utility*) in ever thinking
> > about
> > interactions between items (item-item similarity) or users.
>  Content-based
> > recommendations can act purely as a generalized search engine, where the
> > trick is just coming up with the search terms / query features to use for
> > each user.
>
> Yes, you're right, if I understand your meaning correctly.
>
> I think that content-based recommendation of this form is not really a
> conventional recommender system. It smells much more like a search
> problem. I like attributes X Y and Z, so recommend me me items with
> attributes X Y and Z: call 'attributes' as 'search terms' and 'items'
> as 'search results' and yup, it's search. No real ratings here.
>

But how is "presence of term X in both item1 and user1" as a boolean
preference value any different than "user1 has a preference for
attribute(X)"?  Similarly, tf-idf weightings provide a floating point
"rating" for correlations between different item types.

The reason why I think this kind of recommender is not so strange is
that you can group together attributes into fields / column-families,
and while presence/absence (or tf-idf, or whatever) can act as
raw ratings, you can then add in arbitrary model weights *between*
fields which are *learned* by feedback (use logistic regression,
for example) from the user-item ratings table.  Does that make sense?

> So I suppose I am resisting implementing this as a recommender system
> since it's well in hand from search frameworks, but I'm not sure how
> valid that is.
>

It *exists* as a search setup, but at least in e.g Lucene, it's not designed
to do this, really, and there are lots of hacks you have to do (the
normalization
is wrong, the dot product isn't really cosine, you have to work to make it
into tanimoto/etc).  And search setups aren't really designed to do batch
recommendations of this kind either.  Trust me, you can do this with search,
and sometimes its a good idea, but it's kindof a kludge, and it's not at all
straightforward (but the goal is a totally valid one!).

>
>
> >  * on webpage (type W), you have certain set of features, and users come
> to
> > that
> > webpage, sometimes with no prior history, so if you want to recommend
> > (serve)
> > ads (type A) to the user, recommending based purely on some kind of
> > content-based
> > correlation between items of type W and A can work.
>
> Alrighty so users are webpages (W) and items are ads (A) and you're
> recommending ads to webpages. And you intend to use the text of W and
> A to recommend? Yup, that's valid, but smells like search, and
> something a search framework would do well on. I would say: figure out
> which Ws 'prefer' which As based on clicks, and maybe base ad
> recommendations on textual similarity between As. That's a(n
> item-based) recommender.
>

But what you're suggesting here is one particular choice of solution - it's
presupposing that that one is the best.  Why not say: similarity(W,A) =
alpha_0 * (W_title * A_title) + alpha_1 * (W_header * A_title) + alpha_2 *
(W_subHeader * A_body) + alpha_3 * (W_tags * A_landingURL) + ...
and then train your alpha_i to optimize clickthrough?

> Well there's no reason that a recommender framework shouldn't support
> search-like approaches. I have convinced myself that what I have on my
> hands is really a collaborative filtering framework. I think it's
> somewhere on the roadmap, therefore, to expand into these other
> techniques.
>

Why should we limit ourselves to just a CF framework?  Why not a
recommendation framework which can easily do both?

  -jake

Re: What is content based recommendation, to you

Posted by Jake Mannix <ja...@gmail.com>.

On Wed, Jan 27, 2010 at 2:27 PM, Sean Owen <sr...@gmail.com> wrote:

> On Wed, Jan 27, 2010 at 2:15 AM, Jake Mannix <ja...@gmail.com>
> wrote:
> > There is no need (although there may be much *utility*) in ever thinking
> > about
> > interactions between items (item-item similarity) or users.
>  Content-based
> > recommendations can act purely as a generalized search engine, where the
> > trick is just coming up with the search terms / query features to use for
> > each user.
>
> Yes, you're right, if I understand your meaning correctly.
>
> I think that content-based recommendation of this form is not really a
> conventional recommender system. It smells much more like a search
> problem. I like attributes X Y and Z, so recommend me me items with
> attributes X Y and Z: call 'attributes' as 'search terms' and 'items'
> as 'search results' and yup, it's search. No real ratings here.
>

But how is "presence of term X in both item1 and user1" as a boolean
preference value any different than "user1 has a preference for
attribute(X)"?  Similarly, tf-idf weightings provide a floating point
"rating" for correlations between different item types.

The reason why I think this kind of recommender is not so strange is
that you can group together attributes into fields / column-families,
and while presence/absence (or tf-idf, or whatever) can act as
raw ratings, you can then add in arbitrary model weights *between*
fields which are *learned* by feedback (use logistic regression,
for example) from the user-item ratings table.  Does that make sense?

> So I suppose I am resisting implementing this as a recommender system
> since it's well in hand from search frameworks, but I'm not sure how
> valid that is.
>

It *exists* as a search setup, but at least in e.g Lucene, it's not designed
to do this, really, and there are lots of hacks you have to do (the
normalization
is wrong, the dot product isn't really cosine, you have to work to make it
into tanimoto/etc).  And search setups aren't really designed to do batch
recommendations of this kind either.  Trust me, you can do this with search,
and sometimes its a good idea, but it's kindof a kludge, and it's not at all
straightforward (but the goal is a totally valid one!).

>
>
> >  * on webpage (type W), you have certain set of features, and users come
> to
> > that
> > webpage, sometimes with no prior history, so if you want to recommend
> > (serve)
> > ads (type A) to the user, recommending based purely on some kind of
> > content-based
> > correlation between items of type W and A can work.
>
> Alrighty so users are webpages (W) and items are ads (A) and you're
> recommending ads to webpages. And you intend to use the text of W and
> A to recommend? Yup, that's valid, but smells like search, and
> something a search framework would do well on. I would say: figure out
> which Ws 'prefer' which As based on clicks, and maybe base ad
> recommendations on textual similarity between As. That's a(n
> item-based) recommender.
>

But what you're suggesting here is one particular choice of solution - it's
presupposing that that one is the best.  Why not say: similarity(W,A) =
alpha_0 * (W_title * A_title) + alpha_1 * (W_header * A_title) + alpha_2 *
(W_subHeader * A_body) + alpha_3 * (W_tags * A_landingURL) + ...
and then train your alpha_i to optimize clickthrough?

> Well there's no reason that a recommender framework shouldn't support
> search-like approaches. I have convinced myself that what I have on my
> hands is really a collaborative filtering framework. I think it's
> somewhere on the roadmap, therefore, to expand into these other
> techniques.
>

Why should we limit ourselves to just a CF framework?  Why not a
recommendation framework which can easily do both?

  -jake

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

On Wed, Jan 27, 2010 at 2:15 AM, Jake Mannix <ja...@gmail.com> wrote:
> There is no need (although there may be much *utility*) in ever thinking
> about
> interactions between items (item-item similarity) or users.  Content-based
> recommendations can act purely as a generalized search engine, where the
> trick is just coming up with the search terms / query features to use for
> each user.

Yes, you're right, if I understand your meaning correctly.

I think that content-based recommendation of this form is not really a
conventional recommender system. It smells much more like a search
problem. I like attributes X Y and Z, so recommend me me items with
attributes X Y and Z: call 'attributes' as 'search terms' and 'items'
as 'search results' and yup, it's search. No real ratings here.

So I suppose I am resisting implementing this as a recommender system
since it's well in hand from search frameworks, but I'm not sure how
valid that is.

>  * on webpage (type W), you have certain set of features, and users come to
> that
> webpage, sometimes with no prior history, so if you want to recommend
> (serve)
> ads (type A) to the user, recommending based purely on some kind of
> content-based
> correlation between items of type W and A can work.

Alrighty so users are webpages (W) and items are ads (A) and you're
recommending ads to webpages. And you intend to use the text of W and
A to recommend? Yup, that's valid, but smells like search, and
something a search framework would do well on. I would say: figure out
which Ws 'prefer' which As based on clicks, and maybe base ad
recommendations on textual similarity between As. That's a(n
item-based) recommender.

> In both of these cases, you can do a full-fledged recommendation engine with
> no
> users whatsoever, with content and item information across multiple domains.

I think you hit the key advantage of not relying on preferences here.
I guess I'm contending that without preferences, it's properly in the
domain of search instead of recommendation (where I typically mean
'collaborative filtering' by this term)

> The other advantage of thinking of content-based recommender systems this
> way
> is that now you have an entirely new axis to think about: CF goes one way,
> and
> content-based "searching" goes another, and there is an entire spectrum of
> "fusion"
> models which mix the two.

Well there's no reason that a recommender framework shouldn't support
search-like approaches. I have convinced myself that what I have on my
hands is really a collaborative filtering framework. I think it's
somewhere on the roadmap, therefore, to expand into these other
techniques.

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

This is a fine way to go (and I have often (mis)used search engines as
recommendation engines).

Another angle is to consider the item level recommendations for a single
item to simply be additional attributes.  You can also look at user level
cooccurrence analysis of attributes (including SVD) as simply a way to
smooth out the attributes a bit so that sparsity doesn't take such a big
bite out of serendipity.

This makes cooccurrence analysis look a whale of a lot like anchor text
propagation which speaks to your final point.

On Tue, Jan 26, 2010 at 6:15 PM, Jake Mannix <ja...@gmail.com> wrote:

> On Tue, Jan 26, 2010 at 3:36 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > I define it a bit differently by redefining recommendations as machine
> > learning.
> >
> > On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> > > I would narrow and specify this, in the context of Mahout, to have a
> > > collaborative filtering angle:
> >
>
> Since Ted (Mr. Machine Learning) wants to describe content-based
> recommendations
> as machine learning, and Sean (Mr. Taste/CF) goes and describes it it terms
> of
> collaborative filtering, I suppose I'll put on my "search guy" hat, and
> describe it the
> way I see it:
>

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

This is a fine way to go (and I have often (mis)used search engines as
recommendation engines).

Another angle is to consider the item level recommendations for a single
item to simply be additional attributes.  You can also look at user level
cooccurrence analysis of attributes (including SVD) as simply a way to
smooth out the attributes a bit so that sparsity doesn't take such a big
bite out of serendipity.

This makes cooccurrence analysis look a whale of a lot like anchor text
propagation which speaks to your final point.

On Tue, Jan 26, 2010 at 6:15 PM, Jake Mannix <ja...@gmail.com> wrote:

> On Tue, Jan 26, 2010 at 3:36 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > I define it a bit differently by redefining recommendations as machine
> > learning.
> >
> > On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> > > I would narrow and specify this, in the context of Mahout, to have a
> > > collaborative filtering angle:
> >
>
> Since Ted (Mr. Machine Learning) wants to describe content-based
> recommendations
> as machine learning, and Sean (Mr. Taste/CF) goes and describes it it terms
> of
> collaborative filtering, I suppose I'll put on my "search guy" hat, and
> describe it the
> way I see it:
>

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

On Wed, Jan 27, 2010 at 2:15 AM, Jake Mannix <ja...@gmail.com> wrote:
> There is no need (although there may be much *utility*) in ever thinking
> about
> interactions between items (item-item similarity) or users.  Content-based
> recommendations can act purely as a generalized search engine, where the
> trick is just coming up with the search terms / query features to use for
> each user.

Yes, you're right, if I understand your meaning correctly.

I think that content-based recommendation of this form is not really a
conventional recommender system. It smells much more like a search
problem. I like attributes X Y and Z, so recommend me me items with
attributes X Y and Z: call 'attributes' as 'search terms' and 'items'
as 'search results' and yup, it's search. No real ratings here.

So I suppose I am resisting implementing this as a recommender system
since it's well in hand from search frameworks, but I'm not sure how
valid that is.

>  * on webpage (type W), you have certain set of features, and users come to
> that
> webpage, sometimes with no prior history, so if you want to recommend
> (serve)
> ads (type A) to the user, recommending based purely on some kind of
> content-based
> correlation between items of type W and A can work.

Alrighty so users are webpages (W) and items are ads (A) and you're
recommending ads to webpages. And you intend to use the text of W and
A to recommend? Yup, that's valid, but smells like search, and
something a search framework would do well on. I would say: figure out
which Ws 'prefer' which As based on clicks, and maybe base ad
recommendations on textual similarity between As. That's a(n
item-based) recommender.

> In both of these cases, you can do a full-fledged recommendation engine with
> no
> users whatsoever, with content and item information across multiple domains.

I think you hit the key advantage of not relying on preferences here.
I guess I'm contending that without preferences, it's properly in the
domain of search instead of recommendation (where I typically mean
'collaborative filtering' by this term)

> The other advantage of thinking of content-based recommender systems this
> way
> is that now you have an entirely new axis to think about: CF goes one way,
> and
> content-based "searching" goes another, and there is an entire spectrum of
> "fusion"
> models which mix the two.

Well there's no reason that a recommender framework shouldn't support
search-like approaches. I have convinced myself that what I have on my
hands is really a collaborative filtering framework. I think it's
somewhere on the roadmap, therefore, to expand into these other
techniques.

Re: What is content based recommendation, to you

Posted by Jake Mannix <ja...@gmail.com>.

On Tue, Jan 26, 2010 at 3:36 PM, Ted Dunning <te...@gmail.com> wrote:

> I define it a bit differently by redefining recommendations as machine
> learning.
>
> On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sr...@gmail.com> wrote:
>
> > I would narrow and specify this, in the context of Mahout, to have a
> > collaborative filtering angle:
>

Since Ted (Mr. Machine Learning) wants to describe content-based
recommendations
as machine learning, and Sean (Mr. Taste/CF) goes and describes it it terms
of
collaborative filtering, I suppose I'll put on my "search guy" hat, and
describe it the
way I see it:

Items have attributes (e.g. text features), and users express preference for
some
attributes (e.g.  explicit entering of text keywords), and the recommender
(a.k.a.
search engine) returns a ranked list of items which take those preferences
and
find the best items which have some of those preferences.

Generalizing a bit beyond that example, users may not make explicit mention
of
certain attributes, but we may infer them from some other source (a user on
a
social network may have a profile, a member of a dating website may have
answered a questionnaire expressing some preferences, etc.) and use these
to generate a "query" against the recommender.

There is no need (although there may be much *utility*) in ever thinking
about
interactions between items (item-item similarity) or users.  Content-based
recommendations can act purely as a generalized search engine, where the
trick is just coming up with the search terms / query features to use for
each user.

An advantage of thinking of it this way means that you don't need to think
about "users" at all: you can have recommendations of items of type A
against items of type B:

  * on webpage (type W), you have certain set of features, and users come to
that
webpage, sometimes with no prior history, so if you want to recommend
(serve)
ads (type A) to the user, recommending based purely on some kind of
content-based
correlation between items of type W and A can work.

  * on a job board, recruiters can post job listings (type J), and you want
to recommend
possible resumes (type R) to the job (*not* to the recruiter, because the
recruiter has
distinctly different "preferences" for each job - the *job* is the thing
which wants
recommendations).

In both of these cases, you can do a full-fledged recommendation engine with
no
users whatsoever, with content and item information across multiple domains.

The other advantage of thinking of content-based recommender systems this
way
is that now you have an entirely new axis to think about: CF goes one way,
and
content-based "searching" goes another, and there is an entire spectrum of
"fusion"
models which mix the two.

(of course, this leaves out one further piece of information which is
similar to CF,
but deserves its own treatment: explicit link information, available in the
form of
web-graph links, or social network links - recommenders based on this
information
can look a lot like CF, but it's using *explicit* user-user or item-item
correlations
instead of based implicitly due to co-occurrence / usage).

  -jake

Re: What is content based recommendation, to you

Posted by Sean Owen <sr...@gmail.com>.

Nice, good wisdom here.


I agree about the appeal and problems of thinking of item-attribute
pairs as your items.

You're saying content-based recommendation, in practice, is often a
matter of substituting one dominant item attribute in place of items
-- recommending on artist, rather than artist track. OK, check, one
can do that in the current framework by using artists as items. So I
think that's supported for free.

And maybe my other notion of a way to bring content-based
recommendation into the framework -- some organized framework for
constructing and tuning a notion of item similarity based on
attributes -- also has merit and belongs in the category of
"content-based" techniques.


I ask because there's been some request to talk more about
content-based recommendation and so I want to build this out more.


On Tue, Jan 26, 2010 at 11:36 PM, Ted Dunning <te...@gmail.com> wrote:
> I define it a bit differently by redefining recommendations as machine
> learning.
>
> Users have preferences for objects with attributes.
>
> We would like to learn from all user/object/attribute preference data to
> predict so-far unobserved preferences of a user for other objects.
>
> Normal recommendations is a subset of this where there is exactly one id
> attribute for every object.
>
> We can extend most recommendation algorithms to this new paradigm relatively
> transparently by considering each expressed item preference to be a bundle
> of attribute preferences.  Our recommendation algorithm needs to produce a
> list of recommended attributes which we integrate into a list of recommended
> items.  The list of recommended attributes might be segregated into a list
> of values for each kind of attribute or it might be in a single list.  The
> segregated approach could just replicate a recommendation engine per
> attribute type.  The combined approach might just label all attributes and
> throw them into a soup of preference data.
>
> The additional code needed consists mostly of writing the code that
> integrates the attribute recommendations into a list of item
> recommendations.  This can be as simple as weighting the recommended
> attributes by rank and doing rankScore * idf retrieval to find the items.
> Some algorithms like LDA have the ability to explicitly integrate the
> different kinds of attributes.  Others really don't.
>
> One problem with this is that you are exploding the number of preferences
> which can present scaling and noise problems.  You also inherently
> intermingle attributes with very different distributional characteristics
> together.  For instance, there might only be a dozen or so colors of shoes
> and thus the number of people who have expressed a preference for some kind
> of red shoe is going to be vastly larger than the number of people who have
> expressed a preference for a specific color of a specific size of a specific
> model of a shoe.  It is common for recommendation systems to fail for very
> common things or for very rare things and integrating both pathological
> situations in a single recommendation framework may be a problem.
>
> My own experience with this is that it is common for one kind of attribute
> to dominates the recommendation process in the sense of providing the most
> oomph and accuracy.  This can be because the data is sparse and some
> attribute provide useful smoothing or it can be that some attributes are too
> general and other attributes provide more precision.  At Musicmatch, for
> instance, the artist attribute provided a disproportionate share of music
> recommendation value above track or album or even song (track != song
> because it is common for the same song to be on many albums giving many
> tracks).  I think that this must only be true to first order and that if you
> dig in, you would find minority classes where different attributes provide
> different amounts of data, but it is rare in startups to get past the first
> order solution.

Re: What is content based recommendation, to you

Posted by Jake Mannix <ja...@gmail.com>.

On Tue, Jan 26, 2010 at 3:36 PM, Ted Dunning <te...@gmail.com> wrote:

> I define it a bit differently by redefining recommendations as machine
> learning.
>
> On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sr...@gmail.com> wrote:
>
> > I would narrow and specify this, in the context of Mahout, to have a
> > collaborative filtering angle:
>

Since Ted (Mr. Machine Learning) wants to describe content-based
recommendations
as machine learning, and Sean (Mr. Taste/CF) goes and describes it it terms
of
collaborative filtering, I suppose I'll put on my "search guy" hat, and
describe it the
way I see it:

Items have attributes (e.g. text features), and users express preference for
some
attributes (e.g.  explicit entering of text keywords), and the recommender
(a.k.a.
search engine) returns a ranked list of items which take those preferences
and
find the best items which have some of those preferences.

Generalizing a bit beyond that example, users may not make explicit mention
of
certain attributes, but we may infer them from some other source (a user on
a
social network may have a profile, a member of a dating website may have
answered a questionnaire expressing some preferences, etc.) and use these
to generate a "query" against the recommender.

There is no need (although there may be much *utility*) in ever thinking
about
interactions between items (item-item similarity) or users.  Content-based
recommendations can act purely as a generalized search engine, where the
trick is just coming up with the search terms / query features to use for
each user.

An advantage of thinking of it this way means that you don't need to think
about "users" at all: you can have recommendations of items of type A
against items of type B:

  * on webpage (type W), you have certain set of features, and users come to
that
webpage, sometimes with no prior history, so if you want to recommend
(serve)
ads (type A) to the user, recommending based purely on some kind of
content-based
correlation between items of type W and A can work.

  * on a job board, recruiters can post job listings (type J), and you want
to recommend
possible resumes (type R) to the job (*not* to the recruiter, because the
recruiter has
distinctly different "preferences" for each job - the *job* is the thing
which wants
recommendations).

In both of these cases, you can do a full-fledged recommendation engine with
no
users whatsoever, with content and item information across multiple domains.

The other advantage of thinking of content-based recommender systems this
way
is that now you have an entirely new axis to think about: CF goes one way,
and
content-based "searching" goes another, and there is an entire spectrum of
"fusion"
models which mix the two.

(of course, this leaves out one further piece of information which is
similar to CF,
but deserves its own treatment: explicit link information, available in the
form of
web-graph links, or social network links - recommenders based on this
information
can look a lot like CF, but it's using *explicit* user-user or item-item
correlations
instead of based implicitly due to co-occurrence / usage).

  -jake

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

I define it a bit differently by redefining recommendations as machine
learning.

Users have preferences for objects with attributes.

We would like to learn from all user/object/attribute preference data to
predict so-far unobserved preferences of a user for other objects.

Normal recommendations is a subset of this where there is exactly one id
attribute for every object.

We can extend most recommendation algorithms to this new paradigm relatively
transparently by considering each expressed item preference to be a bundle
of attribute preferences.  Our recommendation algorithm needs to produce a
list of recommended attributes which we integrate into a list of recommended
items.  The list of recommended attributes might be segregated into a list
of values for each kind of attribute or it might be in a single list.  The
segregated approach could just replicate a recommendation engine per
attribute type.  The combined approach might just label all attributes and
throw them into a soup of preference data.

The additional code needed consists mostly of writing the code that
integrates the attribute recommendations into a list of item
recommendations.  This can be as simple as weighting the recommended
attributes by rank and doing rankScore * idf retrieval to find the items.
Some algorithms like LDA have the ability to explicitly integrate the
different kinds of attributes.  Others really don't.

One problem with this is that you are exploding the number of preferences
which can present scaling and noise problems.  You also inherently
intermingle attributes with very different distributional characteristics
together.  For instance, there might only be a dozen or so colors of shoes
and thus the number of people who have expressed a preference for some kind
of red shoe is going to be vastly larger than the number of people who have
expressed a preference for a specific color of a specific size of a specific
model of a shoe.  It is common for recommendation systems to fail for very
common things or for very rare things and integrating both pathological
situations in a single recommendation framework may be a problem.

My own experience with this is that it is common for one kind of attribute
to dominates the recommendation process in the sense of providing the most
oomph and accuracy.  This can be because the data is sparse and some
attribute provide useful smoothing or it can be that some attributes are too
general and other attributes provide more precision.  At Musicmatch, for
instance, the artist attribute provided a disproportionate share of music
recommendation value above track or album or even song (track != song
because it is common for the same song to be on many albums giving many
tracks).  I think that this must only be true to first order and that if you
dig in, you would find minority classes where different attributes provide
different amounts of data, but it is rare in startups to get past the first
order solution.

On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sr...@gmail.com> wrote:

> I want to knock down some support for content based recommendation.
> And I want to solicit ideas about what this even means to its intended
> audience -- users.
>
> I define it broadly as a recommender in which:
> - items have attributes (e.g. books have genres, titles, authors)
> rather than being completely opaque entities
> - users have affinities for attributes
> - users are recommended items with attributes they like
>
> I would narrow and specify this, in the context of Mahout, to have a
> collaborative filtering angle:
> - items have attributes, still
> - users have preferences for items (classic CF)
> - (therefore, users implicitly have affinities for attributes)
> - item similarity can be defined in terms of item attributes, in some way
> - users are recommended items that are similar to other items they
> like (item-based recommendation)
> - (therefore, users are recommended items with attributes they like)
>
> This is my spin on content based recommendation in Mahout. I define it
> as a special case of item-based recommendation. Thoughts?
>
> So, the idea is to provide some non-trivial framework for supporting
> item attributes, and defining similarity in terms of attributes.
> Thoughts on what that should look like?
>
> Sean
>

-- 
Ted Dunning, CTO
DeepDyve

Re: What is content based recommendation, to you

Posted by Ted Dunning <te...@gmail.com>.

I define it a bit differently by redefining recommendations as machine
learning.

Users have preferences for objects with attributes.

We would like to learn from all user/object/attribute preference data to
predict so-far unobserved preferences of a user for other objects.

Normal recommendations is a subset of this where there is exactly one id
attribute for every object.

We can extend most recommendation algorithms to this new paradigm relatively
transparently by considering each expressed item preference to be a bundle
of attribute preferences.  Our recommendation algorithm needs to produce a
list of recommended attributes which we integrate into a list of recommended
items.  The list of recommended attributes might be segregated into a list
of values for each kind of attribute or it might be in a single list.  The
segregated approach could just replicate a recommendation engine per
attribute type.  The combined approach might just label all attributes and
throw them into a soup of preference data.

The additional code needed consists mostly of writing the code that
integrates the attribute recommendations into a list of item
recommendations.  This can be as simple as weighting the recommended
attributes by rank and doing rankScore * idf retrieval to find the items.
Some algorithms like LDA have the ability to explicitly integrate the
different kinds of attributes.  Others really don't.

One problem with this is that you are exploding the number of preferences
which can present scaling and noise problems.  You also inherently
intermingle attributes with very different distributional characteristics
together.  For instance, there might only be a dozen or so colors of shoes
and thus the number of people who have expressed a preference for some kind
of red shoe is going to be vastly larger than the number of people who have
expressed a preference for a specific color of a specific size of a specific
model of a shoe.  It is common for recommendation systems to fail for very
common things or for very rare things and integrating both pathological
situations in a single recommendation framework may be a problem.

My own experience with this is that it is common for one kind of attribute
to dominates the recommendation process in the sense of providing the most
oomph and accuracy.  This can be because the data is sparse and some
attribute provide useful smoothing or it can be that some attributes are too
general and other attributes provide more precision.  At Musicmatch, for
instance, the artist attribute provided a disproportionate share of music
recommendation value above track or album or even song (track != song
because it is common for the same song to be on many albums giving many
tracks).  I think that this must only be true to first order and that if you
dig in, you would find minority classes where different attributes provide
different amounts of data, but it is rare in startups to get past the first
order solution.

On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sr...@gmail.com> wrote:

> I want to knock down some support for content based recommendation.
> And I want to solicit ideas about what this even means to its intended
> audience -- users.
>
> I define it broadly as a recommender in which:
> - items have attributes (e.g. books have genres, titles, authors)
> rather than being completely opaque entities
> - users have affinities for attributes
> - users are recommended items with attributes they like
>
> I would narrow and specify this, in the context of Mahout, to have a
> collaborative filtering angle:
> - items have attributes, still
> - users have preferences for items (classic CF)
> - (therefore, users implicitly have affinities for attributes)
> - item similarity can be defined in terms of item attributes, in some way
> - users are recommended items that are similar to other items they
> like (item-based recommendation)
> - (therefore, users are recommended items with attributes they like)
>
> This is my spin on content based recommendation in Mahout. I define it
> as a special case of item-based recommendation. Thoughts?
>
> So, the idea is to provide some non-trivial framework for supporting
> item attributes, and defining similarity in terms of attributes.
> Thoughts on what that should look like?
>
> Sean
>

-- 
Ted Dunning, CTO
DeepDyve