You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by "Alexandre Rodrigues (FEUP)" <al...@fe.up.pt> on 2011/02/07 16:56:15 UTC

Hybrid RecSys — ways to do it

Hello Mahouters out there!

I'm diving into the amazing world of Mahout and Hadoop and I have some
questions about it. My project consists in developing a recommender system
for TV shows, and my objective is to study how can I ensemble/mix some
approaches, like content-based and collaborative filtering (with weights for
example). Is there _the way_ to do it using Mahout, or it's an unexplored
subject at the moment?

Thanks in advance!
--
Alexandre Rodrigues

Re: Hybrid RecSys — ways to do it

Posted by Steven Bourke <sb...@gmail.com>.

You will need to write your own implementation, but its not a lot of work.

On Mon, Feb 7, 2011 at 3:56 PM, Alexandre Rodrigues (FEUP) <
alexandre.rodrigues@fe.up.pt> wrote:

> Hello Mahouters out there!
>
> I'm diving into the amazing world of Mahout and Hadoop and I have some
> questions about it. My project consists in developing a recommender system
> for TV shows, and my objective is to study how can I ensemble/mix some
> approaches, like content-based and collaborative filtering (with weights
> for
> example). Is there _the way_ to do it using Mahout, or it's an unexplored
> subject at the moment?
>
> Thanks in advance!
> --
> Alexandre Rodrigues
>

Re: Hybrid RecSys — ways to do it

Posted by Ted Dunning <te...@gmail.com>.

Since you are predicting the probability of a binary event, I would suggest
a logistic regression or close equivalent.

But almost anything will work well for this because of the relatively low
dimension.  If you do choose a linear regression, clip the output to the
[0,1] range.

On Mon, Jun 27, 2011 at 12:57 AM, Marko Ciric <ci...@gmail.com> wrote:

> Thanks Ted. Do you think weights (that depend on mentioned features) can be
> learned with simple linear regression once the outputs of Mahout
> recommenders are known?
>
> On 10 June 2011 08:02, Ted Dunning <te...@gmail.com> wrote:
>
> > When stacking recommenders, you need to have features that represent:
> >
> > a) the output of the recommenders in question
> >
> > b) features that you think will help.  Number of data points is a
> > classic.  Log transformations are often a good idea with count
> > features.
> >
> > c) interactions of (a) and (b) are generally critical
> >
> > If you have a per-user quality or satisfaction indicator and a
> > per-user current model indicator then you might be able to use these
> > as a feature for an interesting "if it ain't broke, don't fix it"
> > stacking model.
> >
> > On Thu, Jun 9, 2011 at 3:51 PM, Marko Ciric <ci...@gmail.com>
> wrote:
> > > I'm sorry, I am currently working on recommender stacking and I have a
> > > couple of questions here:
> > > 1. How a number of occurrences of the recommended item is an indicator
> of
> > > how much information the recommender has to work with?
> > > 2. To classify items with SGD, an extraction of item's features is
> > required
> > > first if I'm correct. What features to use when the recommended items
> > (that
> > > need to be classified) are a result of different recommenders that use
> > > different similarity calculation (only a "brand" recommender is using
> an
> > > item feature here and CF and top-40 recommenders are not)?
> > >
> > > Thanks,
> > > Marko
> > >
> > >
> > > On 8 February 2011 03:27, Ted Dunning <te...@gmail.com> wrote:
> > >>
> > >> See also here http://arxiv.org/abs/1006.2156
> > >>
> > >> Another approach is to build a conventional recommender for items and
> > >> attach
> > >> an indicator of how much information that recommender has to work with
> > >> (number of occurrences of the recommended item might be good enough).
> > >>  Then
> > >> do the same for some prominent characteristic of the items.  This
> might
> > >> give
> > >> you a "brand" recommender for retail products or an "artist"
> recommender
> > >> for
> > >> music.   For this more generic recommender, you might be able to
> > directly
> > >> use the counts from the user's history.  Finally, build "top-40"
> models
> > >> for
> > >> overall item, brand, artist or what have you characteristics.
> > >>
> > >> Now train a simple model to combine these results to find items that
> the
> > >> user is likely to engage with.  SGD is an easy choice here.  At
> > >> recommendation time, you would run all of the constituent recommenders
> > and
> > >> use the SGD model to rescore the union of their results.
> > >>
> > >> If done well, the brand and top-40 models will give you decent cold
> > start
> > >> behavior while the real collaborative filtering models will give you
> > good
> > >> performance after the cold-start.  The SGD should be able to meld
> these
> > >> values well if it has a good indicator of how reliable each sub-model
> > is.
> > >>
> > >> On Mon, Feb 7, 2011 at 4:11 PM, Steven Bourke <sb...@gmail.com>
> > wrote:
> > >>
> > >> > Check http://www.springerlink.com/content/n881136032u8k111/ out. Do
> a
> > >> > search
> > >> > on google scholar and you might find  the pdf.
> > >> >
> > >> > What type of data / recommendations are you trying to make? Standard
> > >> > collaborative filtering techniques arent a bad thing.
> > >> >
> > >> > On Tue, Feb 8, 2011 at 12:05 AM, Chris Schilling <
> chris@cellixis.com>
> > >> > wrote:
> > >> >
> > >> > > I am interested in this problem as well (combining content
> > similarity
> > >> > with
> > >> > > CF).
> > >> > >
> > >> > > I want to build a system which makes use of the CF part of Mahout:
> >  I
> > >> > > am
> > >> > > recommending products to users.  Along with user
> ratings/preferences
> > >> > > for
> > >> > > products, I also have a content based similarity metric calculated
> > for
> > >> > each
> > >> > > item-item pair.
> > >> > >
> > >> > > I do not have a lot of experience in producing "hybrid"
> > >> > > recommendations.
> > >> > >  Do you generally think the most appropriate thing to do is to
> boost
> > >> > > recommendations from CF?  Or do you like the 2nd method of using a
> > >> > > custom
> > >> > > item similarity to combine cf similarity with content similarity?
> >  It
> > >> > seems
> > >> > > straight forward enough to try both, just trying to get a feel for
> > how
> > >> > > to
> > >> > > approach this.
> > >> > >
> > >> > > Can you recommend any papers describing combination of content and
> > CF?
> > >> > >
> > >> > > Thanks for your help!
> > >> > > Chris S.
> > >> > >
> > >> > > On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:
> > >> > >
> > >> > > > Hi Alexandre,
> > >> > > >
> > >> > > > I dont think there is "one golden way" but I can give you some
> > hints
> > >> > > where to start regarding itembased recommenders. I think there are
> > >> > > three
> > >> > > points where you could customize the behavior to enable "hybrid"
> > >> > > recommendations:
> > >> > > >
> > >> > > > * you can use a custom Rescorer to either filter the resulting
> > >> > > recommended items (e.g. restrict the result to a certain
> > type/category
> > >> > > of
> > >> > > items) or to boost some of them (e.g. by looking at their content)
> > >> > > >
> > >> > > > * you can use a custom ItemSimilarity which could compute a
> > blended
> > >> > score
> > >> > > by combining the usual similarity score with an additional
> > >> > > contentbased
> > >> > > similarity score
> > >> > > >
> > >> > > > * as collaborative filtering usually suffers from the
> "cold-start
> > >> > > problem" (you cannot make any assumptions about new users or items
> > >> > > until
> > >> > > you've seen some interactions), you could work around this by
> > >> > implementing a
> > >> > > custom
> CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy
> > >> > > that
> > >> > > uses content properties to find items to recommend if the user or
> > the
> > >> > item
> > >> > > is new
> > >> > > >
> > >> > > >
> > >> > > > --sebastian
> > >> > > >
> > >> > > > On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
> > >> > > >> Hello Mahouters out there!
> > >> > > >>
> > >> > > >> I'm diving into the amazing world of Mahout and Hadoop and I
> have
> > >> > > >> some
> > >> > > >> questions about it. My project consists in developing a
> > recommender
> > >> > > system
> > >> > > >> for TV shows, and my objective is to study how can I
> ensemble/mix
> > >> > > >> some
> > >> > > >> approaches, like content-based and collaborative filtering
> (with
> > >> > weights
> > >> > > for
> > >> > > >> example). Is there _the way_ to do it using Mahout, or it's an
> > >> > > unexplored
> > >> > > >> subject at the moment?
> > >> > > >>
> > >> > > >> Thanks in advance!
> > >> > > >> --
> > >> > > >> Alexandre Rodrigues
> > >> > > >>
> > >> > > >
> > >> > >
> > >> > >
> > >> >
> > >
> > >
> > >
> > > --
> > > --
> > > Marko Ćirić
> > > ciric.marko@gmail.com
> > >
> >
>
>
>
> --
> --
> Marko Ćirić
> ciric.marko@gmail.com
>

Re: Hybrid RecSys — ways to do it

Posted by Marko Ciric <ci...@gmail.com>.

Thanks Ted. Do you think weights (that depend on mentioned features) can be
learned with simple linear regression once the outputs of Mahout
recommenders are known?

On 10 June 2011 08:02, Ted Dunning <te...@gmail.com> wrote:

> When stacking recommenders, you need to have features that represent:
>
> a) the output of the recommenders in question
>
> b) features that you think will help.  Number of data points is a
> classic.  Log transformations are often a good idea with count
> features.
>
> c) interactions of (a) and (b) are generally critical
>
> If you have a per-user quality or satisfaction indicator and a
> per-user current model indicator then you might be able to use these
> as a feature for an interesting "if it ain't broke, don't fix it"
> stacking model.
>
> On Thu, Jun 9, 2011 at 3:51 PM, Marko Ciric <ci...@gmail.com> wrote:
> > I'm sorry, I am currently working on recommender stacking and I have a
> > couple of questions here:
> > 1. How a number of occurrences of the recommended item is an indicator of
> > how much information the recommender has to work with?
> > 2. To classify items with SGD, an extraction of item's features is
> required
> > first if I'm correct. What features to use when the recommended items
> (that
> > need to be classified) are a result of different recommenders that use
> > different similarity calculation (only a "brand" recommender is using an
> > item feature here and CF and top-40 recommenders are not)?
> >
> > Thanks,
> > Marko
> >
> >
> > On 8 February 2011 03:27, Ted Dunning <te...@gmail.com> wrote:
> >>
> >> See also here http://arxiv.org/abs/1006.2156
> >>
> >> Another approach is to build a conventional recommender for items and
> >> attach
> >> an indicator of how much information that recommender has to work with
> >> (number of occurrences of the recommended item might be good enough).
> >>  Then
> >> do the same for some prominent characteristic of the items.  This might
> >> give
> >> you a "brand" recommender for retail products or an "artist" recommender
> >> for
> >> music.   For this more generic recommender, you might be able to
> directly
> >> use the counts from the user's history.  Finally, build "top-40" models
> >> for
> >> overall item, brand, artist or what have you characteristics.
> >>
> >> Now train a simple model to combine these results to find items that the
> >> user is likely to engage with.  SGD is an easy choice here.  At
> >> recommendation time, you would run all of the constituent recommenders
> and
> >> use the SGD model to rescore the union of their results.
> >>
> >> If done well, the brand and top-40 models will give you decent cold
> start
> >> behavior while the real collaborative filtering models will give you
> good
> >> performance after the cold-start.  The SGD should be able to meld these
> >> values well if it has a good indicator of how reliable each sub-model
> is.
> >>
> >> On Mon, Feb 7, 2011 at 4:11 PM, Steven Bourke <sb...@gmail.com>
> wrote:
> >>
> >> > Check http://www.springerlink.com/content/n881136032u8k111/ out. Do a
> >> > search
> >> > on google scholar and you might find  the pdf.
> >> >
> >> > What type of data / recommendations are you trying to make? Standard
> >> > collaborative filtering techniques arent a bad thing.
> >> >
> >> > On Tue, Feb 8, 2011 at 12:05 AM, Chris Schilling <ch...@cellixis.com>
> >> > wrote:
> >> >
> >> > > I am interested in this problem as well (combining content
> similarity
> >> > with
> >> > > CF).
> >> > >
> >> > > I want to build a system which makes use of the CF part of Mahout:
>  I
> >> > > am
> >> > > recommending products to users.  Along with user ratings/preferences
> >> > > for
> >> > > products, I also have a content based similarity metric calculated
> for
> >> > each
> >> > > item-item pair.
> >> > >
> >> > > I do not have a lot of experience in producing "hybrid"
> >> > > recommendations.
> >> > >  Do you generally think the most appropriate thing to do is to boost
> >> > > recommendations from CF?  Or do you like the 2nd method of using a
> >> > > custom
> >> > > item similarity to combine cf similarity with content similarity?
>  It
> >> > seems
> >> > > straight forward enough to try both, just trying to get a feel for
> how
> >> > > to
> >> > > approach this.
> >> > >
> >> > > Can you recommend any papers describing combination of content and
> CF?
> >> > >
> >> > > Thanks for your help!
> >> > > Chris S.
> >> > >
> >> > > On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:
> >> > >
> >> > > > Hi Alexandre,
> >> > > >
> >> > > > I dont think there is "one golden way" but I can give you some
> hints
> >> > > where to start regarding itembased recommenders. I think there are
> >> > > three
> >> > > points where you could customize the behavior to enable "hybrid"
> >> > > recommendations:
> >> > > >
> >> > > > * you can use a custom Rescorer to either filter the resulting
> >> > > recommended items (e.g. restrict the result to a certain
> type/category
> >> > > of
> >> > > items) or to boost some of them (e.g. by looking at their content)
> >> > > >
> >> > > > * you can use a custom ItemSimilarity which could compute a
> blended
> >> > score
> >> > > by combining the usual similarity score with an additional
> >> > > contentbased
> >> > > similarity score
> >> > > >
> >> > > > * as collaborative filtering usually suffers from the "cold-start
> >> > > problem" (you cannot make any assumptions about new users or items
> >> > > until
> >> > > you've seen some interactions), you could work around this by
> >> > implementing a
> >> > > custom CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy
> >> > > that
> >> > > uses content properties to find items to recommend if the user or
> the
> >> > item
> >> > > is new
> >> > > >
> >> > > >
> >> > > > --sebastian
> >> > > >
> >> > > > On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
> >> > > >> Hello Mahouters out there!
> >> > > >>
> >> > > >> I'm diving into the amazing world of Mahout and Hadoop and I have
> >> > > >> some
> >> > > >> questions about it. My project consists in developing a
> recommender
> >> > > system
> >> > > >> for TV shows, and my objective is to study how can I ensemble/mix
> >> > > >> some
> >> > > >> approaches, like content-based and collaborative filtering (with
> >> > weights
> >> > > for
> >> > > >> example). Is there _the way_ to do it using Mahout, or it's an
> >> > > unexplored
> >> > > >> subject at the moment?
> >> > > >>
> >> > > >> Thanks in advance!
> >> > > >> --
> >> > > >> Alexandre Rodrigues
> >> > > >>
> >> > > >
> >> > >
> >> > >
> >> >
> >
> >
> >
> > --
> > --
> > Marko Ćirić
> > ciric.marko@gmail.com
> >
>



-- 
--
Marko Ćirić
ciric.marko@gmail.com

Re: Hybrid RecSys — ways to do it

Posted by Ted Dunning <te...@gmail.com>.

When stacking recommenders, you need to have features that represent:

a) the output of the recommenders in question

b) features that you think will help.  Number of data points is a
classic.  Log transformations are often a good idea with count
features.

c) interactions of (a) and (b) are generally critical

If you have a per-user quality or satisfaction indicator and a
per-user current model indicator then you might be able to use these
as a feature for an interesting "if it ain't broke, don't fix it"
stacking model.

On Thu, Jun 9, 2011 at 3:51 PM, Marko Ciric <ci...@gmail.com> wrote:
> I'm sorry, I am currently working on recommender stacking and I have a
> couple of questions here:
> 1. How a number of occurrences of the recommended item is an indicator of
> how much information the recommender has to work with?
> 2. To classify items with SGD, an extraction of item's features is required
> first if I'm correct. What features to use when the recommended items (that
> need to be classified) are a result of different recommenders that use
> different similarity calculation (only a "brand" recommender is using an
> item feature here and CF and top-40 recommenders are not)?
>
> Thanks,
> Marko
>
>
> On 8 February 2011 03:27, Ted Dunning <te...@gmail.com> wrote:
>>
>> See also here http://arxiv.org/abs/1006.2156
>>
>> Another approach is to build a conventional recommender for items and
>> attach
>> an indicator of how much information that recommender has to work with
>> (number of occurrences of the recommended item might be good enough).
>>  Then
>> do the same for some prominent characteristic of the items.  This might
>> give
>> you a "brand" recommender for retail products or an "artist" recommender
>> for
>> music.   For this more generic recommender, you might be able to directly
>> use the counts from the user's history.  Finally, build "top-40" models
>> for
>> overall item, brand, artist or what have you characteristics.
>>
>> Now train a simple model to combine these results to find items that the
>> user is likely to engage with.  SGD is an easy choice here.  At
>> recommendation time, you would run all of the constituent recommenders and
>> use the SGD model to rescore the union of their results.
>>
>> If done well, the brand and top-40 models will give you decent cold start
>> behavior while the real collaborative filtering models will give you good
>> performance after the cold-start.  The SGD should be able to meld these
>> values well if it has a good indicator of how reliable each sub-model is.
>>
>> On Mon, Feb 7, 2011 at 4:11 PM, Steven Bourke <sb...@gmail.com> wrote:
>>
>> > Check http://www.springerlink.com/content/n881136032u8k111/ out. Do a
>> > search
>> > on google scholar and you might find  the pdf.
>> >
>> > What type of data / recommendations are you trying to make? Standard
>> > collaborative filtering techniques arent a bad thing.
>> >
>> > On Tue, Feb 8, 2011 at 12:05 AM, Chris Schilling <ch...@cellixis.com>
>> > wrote:
>> >
>> > > I am interested in this problem as well (combining content similarity
>> > with
>> > > CF).
>> > >
>> > > I want to build a system which makes use of the CF part of Mahout:  I
>> > > am
>> > > recommending products to users.  Along with user ratings/preferences
>> > > for
>> > > products, I also have a content based similarity metric calculated for
>> > each
>> > > item-item pair.
>> > >
>> > > I do not have a lot of experience in producing "hybrid"
>> > > recommendations.
>> > >  Do you generally think the most appropriate thing to do is to boost
>> > > recommendations from CF?  Or do you like the 2nd method of using a
>> > > custom
>> > > item similarity to combine cf similarity with content similarity?  It
>> > seems
>> > > straight forward enough to try both, just trying to get a feel for how
>> > > to
>> > > approach this.
>> > >
>> > > Can you recommend any papers describing combination of content and CF?
>> > >
>> > > Thanks for your help!
>> > > Chris S.
>> > >
>> > > On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:
>> > >
>> > > > Hi Alexandre,
>> > > >
>> > > > I dont think there is "one golden way" but I can give you some hints
>> > > where to start regarding itembased recommenders. I think there are
>> > > three
>> > > points where you could customize the behavior to enable "hybrid"
>> > > recommendations:
>> > > >
>> > > > * you can use a custom Rescorer to either filter the resulting
>> > > recommended items (e.g. restrict the result to a certain type/category
>> > > of
>> > > items) or to boost some of them (e.g. by looking at their content)
>> > > >
>> > > > * you can use a custom ItemSimilarity which could compute a blended
>> > score
>> > > by combining the usual similarity score with an additional
>> > > contentbased
>> > > similarity score
>> > > >
>> > > > * as collaborative filtering usually suffers from the "cold-start
>> > > problem" (you cannot make any assumptions about new users or items
>> > > until
>> > > you've seen some interactions), you could work around this by
>> > implementing a
>> > > custom CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy
>> > > that
>> > > uses content properties to find items to recommend if the user or the
>> > item
>> > > is new
>> > > >
>> > > >
>> > > > --sebastian
>> > > >
>> > > > On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
>> > > >> Hello Mahouters out there!
>> > > >>
>> > > >> I'm diving into the amazing world of Mahout and Hadoop and I have
>> > > >> some
>> > > >> questions about it. My project consists in developing a recommender
>> > > system
>> > > >> for TV shows, and my objective is to study how can I ensemble/mix
>> > > >> some
>> > > >> approaches, like content-based and collaborative filtering (with
>> > weights
>> > > for
>> > > >> example). Is there _the way_ to do it using Mahout, or it's an
>> > > unexplored
>> > > >> subject at the moment?
>> > > >>
>> > > >> Thanks in advance!
>> > > >> --
>> > > >> Alexandre Rodrigues
>> > > >>
>> > > >
>> > >
>> > >
>> >
>
>
>
> --
> --
> Marko Ćirić
> ciric.marko@gmail.com
>

Re: Hybrid RecSys — ways to do it

Posted by Marko Ciric <ci...@gmail.com>.

I'm sorry, I am currently working on recommender stacking and I have a
couple of questions here:
1. How a number of occurrences of the recommended item is an indicator of
how much information the recommender has to work with?
2. To classify items with SGD, an extraction of item's features is required
first if I'm correct. What features to use when the recommended items (that
need to be classified) are a result of different recommenders that use
different similarity calculation (only a "brand" recommender is using an
item feature here and CF and top-40 recommenders are not)?

Thanks,
Marko


On 8 February 2011 03:27, Ted Dunning <te...@gmail.com> wrote:

> See also here http://arxiv.org/abs/1006.2156
>
> Another approach is to build a conventional recommender for items and
> attach
> an indicator of how much information that recommender has to work with
> (number of occurrences of the recommended item might be good enough).  Then
> do the same for some prominent characteristic of the items.  This might
> give
> you a "brand" recommender for retail products or an "artist" recommender
> for
> music.   For this more generic recommender, you might be able to directly
> use the counts from the user's history.  Finally, build "top-40" models for
> overall item, brand, artist or what have you characteristics.
>
> Now train a simple model to combine these results to find items that the
> user is likely to engage with.  SGD is an easy choice here.  At
> recommendation time, you would run all of the constituent recommenders and
> use the SGD model to rescore the union of their results.
>
> If done well, the brand and top-40 models will give you decent cold start
> behavior while the real collaborative filtering models will give you good
> performance after the cold-start.  The SGD should be able to meld these
> values well if it has a good indicator of how reliable each sub-model is.
>
> On Mon, Feb 7, 2011 at 4:11 PM, Steven Bourke <sb...@gmail.com> wrote:
>
> > Check http://www.springerlink.com/content/n881136032u8k111/ out. Do a
> > search
> > on google scholar and you might find  the pdf.
> >
> > What type of data / recommendations are you trying to make? Standard
> > collaborative filtering techniques arent a bad thing.
> >
> > On Tue, Feb 8, 2011 at 12:05 AM, Chris Schilling <ch...@cellixis.com>
> > wrote:
> >
> > > I am interested in this problem as well (combining content similarity
> > with
> > > CF).
> > >
> > > I want to build a system which makes use of the CF part of Mahout:  I
> am
> > > recommending products to users.  Along with user ratings/preferences
> for
> > > products, I also have a content based similarity metric calculated for
> > each
> > > item-item pair.
> > >
> > > I do not have a lot of experience in producing "hybrid"
> recommendations.
> > >  Do you generally think the most appropriate thing to do is to boost
> > > recommendations from CF?  Or do you like the 2nd method of using a
> custom
> > > item similarity to combine cf similarity with content similarity?  It
> > seems
> > > straight forward enough to try both, just trying to get a feel for how
> to
> > > approach this.
> > >
> > > Can you recommend any papers describing combination of content and CF?
> > >
> > > Thanks for your help!
> > > Chris S.
> > >
> > > On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:
> > >
> > > > Hi Alexandre,
> > > >
> > > > I dont think there is "one golden way" but I can give you some hints
> > > where to start regarding itembased recommenders. I think there are
> three
> > > points where you could customize the behavior to enable "hybrid"
> > > recommendations:
> > > >
> > > > * you can use a custom Rescorer to either filter the resulting
> > > recommended items (e.g. restrict the result to a certain type/category
> of
> > > items) or to boost some of them (e.g. by looking at their content)
> > > >
> > > > * you can use a custom ItemSimilarity which could compute a blended
> > score
> > > by combining the usual similarity score with an additional contentbased
> > > similarity score
> > > >
> > > > * as collaborative filtering usually suffers from the "cold-start
> > > problem" (you cannot make any assumptions about new users or items
> until
> > > you've seen some interactions), you could work around this by
> > implementing a
> > > custom CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy
> that
> > > uses content properties to find items to recommend if the user or the
> > item
> > > is new
> > > >
> > > >
> > > > --sebastian
> > > >
> > > > On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
> > > >> Hello Mahouters out there!
> > > >>
> > > >> I'm diving into the amazing world of Mahout and Hadoop and I have
> some
> > > >> questions about it. My project consists in developing a recommender
> > > system
> > > >> for TV shows, and my objective is to study how can I ensemble/mix
> some
> > > >> approaches, like content-based and collaborative filtering (with
> > weights
> > > for
> > > >> example). Is there _the way_ to do it using Mahout, or it's an
> > > unexplored
> > > >> subject at the moment?
> > > >>
> > > >> Thanks in advance!
> > > >> --
> > > >> Alexandre Rodrigues
> > > >>
> > > >
> > >
> > >
> >
>



-- 
--
Marko Ćirić
ciric.marko@gmail.com

Re: Hybrid RecSys — ways to do it

Posted by Ted Dunning <te...@gmail.com>.

I was under the impression I committed that.

Let me check ...

Yes.  See https://issues.apache.org/jira/browse/MAHOUT-591
 <https://issues.apache.org/jira/browse/MAHOUT-591>
On Tue, Feb 8, 2011 at 2:24 PM, Chris Schilling <ch...@cellixis.com> wrote:

> In the mean time, is the fix to the SGD ModelDissector available (either by
> patch or in the trunk)?
>

Re: Hybrid RecSys — ways to do it

Posted by Chris Schilling <ch...@cellixis.com>.

Hey Ted,

Thanks for this reply.  I was thinking about some way of implementing the FWLS (feature weighted linear stacking) algorithm on top of the CF stuff in Mahout.  I would like to try to use your SGD algorithm to learn the weights.  I'll get to that eventually...

In the mean time, is the fix to the SGD ModelDissector available (either by patch or in the trunk)?

Thanks again,
Chris

On Feb 7, 2011, at 6:27 PM, Ted Dunning wrote:

> See also here http://arxiv.org/abs/1006.2156
> 
> Another approach is to build a conventional recommender for items and attach
> an indicator of how much information that recommender has to work with
> (number of occurrences of the recommended item might be good enough).  Then
> do the same for some prominent characteristic of the items.  This might give
> you a "brand" recommender for retail products or an "artist" recommender for
> music.   For this more generic recommender, you might be able to directly
> use the counts from the user's history.  Finally, build "top-40" models for
> overall item, brand, artist or what have you characteristics.
> 
> Now train a simple model to combine these results to find items that the
> user is likely to engage with.  SGD is an easy choice here.  At
> recommendation time, you would run all of the constituent recommenders and
> use the SGD model to rescore the union of their results.
> 
> If done well, the brand and top-40 models will give you decent cold start
> behavior while the real collaborative filtering models will give you good
> performance after the cold-start.  The SGD should be able to meld these
> values well if it has a good indicator of how reliable each sub-model is.
> 
> On Mon, Feb 7, 2011 at 4:11 PM, Steven Bourke <sb...@gmail.com> wrote:
> 
>> Check http://www.springerlink.com/content/n881136032u8k111/ out. Do a
>> search
>> on google scholar and you might find  the pdf.
>> 
>> What type of data / recommendations are you trying to make? Standard
>> collaborative filtering techniques arent a bad thing.
>> 
>> On Tue, Feb 8, 2011 at 12:05 AM, Chris Schilling <ch...@cellixis.com>
>> wrote:
>> 
>>> I am interested in this problem as well (combining content similarity
>> with
>>> CF).
>>> 
>>> I want to build a system which makes use of the CF part of Mahout:  I am
>>> recommending products to users.  Along with user ratings/preferences for
>>> products, I also have a content based similarity metric calculated for
>> each
>>> item-item pair.
>>> 
>>> I do not have a lot of experience in producing "hybrid" recommendations.
>>> Do you generally think the most appropriate thing to do is to boost
>>> recommendations from CF?  Or do you like the 2nd method of using a custom
>>> item similarity to combine cf similarity with content similarity?  It
>> seems
>>> straight forward enough to try both, just trying to get a feel for how to
>>> approach this.
>>> 
>>> Can you recommend any papers describing combination of content and CF?
>>> 
>>> Thanks for your help!
>>> Chris S.
>>> 
>>> On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:
>>> 
>>>> Hi Alexandre,
>>>> 
>>>> I dont think there is "one golden way" but I can give you some hints
>>> where to start regarding itembased recommenders. I think there are three
>>> points where you could customize the behavior to enable "hybrid"
>>> recommendations:
>>>> 
>>>> * you can use a custom Rescorer to either filter the resulting
>>> recommended items (e.g. restrict the result to a certain type/category of
>>> items) or to boost some of them (e.g. by looking at their content)
>>>> 
>>>> * you can use a custom ItemSimilarity which could compute a blended
>> score
>>> by combining the usual similarity score with an additional contentbased
>>> similarity score
>>>> 
>>>> * as collaborative filtering usually suffers from the "cold-start
>>> problem" (you cannot make any assumptions about new users or items until
>>> you've seen some interactions), you could work around this by
>> implementing a
>>> custom CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy that
>>> uses content properties to find items to recommend if the user or the
>> item
>>> is new
>>>> 
>>>> 
>>>> --sebastian
>>>> 
>>>> On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
>>>>> Hello Mahouters out there!
>>>>> 
>>>>> I'm diving into the amazing world of Mahout and Hadoop and I have some
>>>>> questions about it. My project consists in developing a recommender
>>> system
>>>>> for TV shows, and my objective is to study how can I ensemble/mix some
>>>>> approaches, like content-based and collaborative filtering (with
>> weights
>>> for
>>>>> example). Is there _the way_ to do it using Mahout, or it's an
>>> unexplored
>>>>> subject at the moment?
>>>>> 
>>>>> Thanks in advance!
>>>>> --
>>>>> Alexandre Rodrigues
>>>>> 
>>>> 
>>> 
>>> 
>>

Re: Hybrid RecSys — ways to do it

Posted by Lance Norskog <go...@gmail.com>.

Combining various recommender algorithms is called "stacking". All of
the Netflix contest winners and runner-ups used 25-100 different
recommendation algorithms with finely tuned weights.

On Mon, Feb 7, 2011 at 6:27 PM, Ted Dunning <te...@gmail.com> wrote:
> See also here http://arxiv.org/abs/1006.2156
>
> Another approach is to build a conventional recommender for items and attach
> an indicator of how much information that recommender has to work with
> (number of occurrences of the recommended item might be good enough).  Then
> do the same for some prominent characteristic of the items.  This might give
> you a "brand" recommender for retail products or an "artist" recommender for
> music.   For this more generic recommender, you might be able to directly
> use the counts from the user's history.  Finally, build "top-40" models for
> overall item, brand, artist or what have you characteristics.
>
> Now train a simple model to combine these results to find items that the
> user is likely to engage with.  SGD is an easy choice here.  At
> recommendation time, you would run all of the constituent recommenders and
> use the SGD model to rescore the union of their results.
>
> If done well, the brand and top-40 models will give you decent cold start
> behavior while the real collaborative filtering models will give you good
> performance after the cold-start.  The SGD should be able to meld these
> values well if it has a good indicator of how reliable each sub-model is.
>
> On Mon, Feb 7, 2011 at 4:11 PM, Steven Bourke <sb...@gmail.com> wrote:
>
>> Check http://www.springerlink.com/content/n881136032u8k111/ out. Do a
>> search
>> on google scholar and you might find  the pdf.
>>
>> What type of data / recommendations are you trying to make? Standard
>> collaborative filtering techniques arent a bad thing.
>>
>> On Tue, Feb 8, 2011 at 12:05 AM, Chris Schilling <ch...@cellixis.com>
>> wrote:
>>
>> > I am interested in this problem as well (combining content similarity
>> with
>> > CF).
>> >
>> > I want to build a system which makes use of the CF part of Mahout:  I am
>> > recommending products to users.  Along with user ratings/preferences for
>> > products, I also have a content based similarity metric calculated for
>> each
>> > item-item pair.
>> >
>> > I do not have a lot of experience in producing "hybrid" recommendations.
>> >  Do you generally think the most appropriate thing to do is to boost
>> > recommendations from CF?  Or do you like the 2nd method of using a custom
>> > item similarity to combine cf similarity with content similarity?  It
>> seems
>> > straight forward enough to try both, just trying to get a feel for how to
>> > approach this.
>> >
>> > Can you recommend any papers describing combination of content and CF?
>> >
>> > Thanks for your help!
>> > Chris S.
>> >
>> > On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:
>> >
>> > > Hi Alexandre,
>> > >
>> > > I dont think there is "one golden way" but I can give you some hints
>> > where to start regarding itembased recommenders. I think there are three
>> > points where you could customize the behavior to enable "hybrid"
>> > recommendations:
>> > >
>> > > * you can use a custom Rescorer to either filter the resulting
>> > recommended items (e.g. restrict the result to a certain type/category of
>> > items) or to boost some of them (e.g. by looking at their content)
>> > >
>> > > * you can use a custom ItemSimilarity which could compute a blended
>> score
>> > by combining the usual similarity score with an additional contentbased
>> > similarity score
>> > >
>> > > * as collaborative filtering usually suffers from the "cold-start
>> > problem" (you cannot make any assumptions about new users or items until
>> > you've seen some interactions), you could work around this by
>> implementing a
>> > custom CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy that
>> > uses content properties to find items to recommend if the user or the
>> item
>> > is new
>> > >
>> > >
>> > > --sebastian
>> > >
>> > > On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
>> > >> Hello Mahouters out there!
>> > >>
>> > >> I'm diving into the amazing world of Mahout and Hadoop and I have some
>> > >> questions about it. My project consists in developing a recommender
>> > system
>> > >> for TV shows, and my objective is to study how can I ensemble/mix some
>> > >> approaches, like content-based and collaborative filtering (with
>> weights
>> > for
>> > >> example). Is there _the way_ to do it using Mahout, or it's an
>> > unexplored
>> > >> subject at the moment?
>> > >>
>> > >> Thanks in advance!
>> > >> --
>> > >> Alexandre Rodrigues
>> > >>
>> > >
>> >
>> >
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Hybrid RecSys — ways to do it

Posted by Ted Dunning <te...@gmail.com>.

See also here http://arxiv.org/abs/1006.2156

Another approach is to build a conventional recommender for items and attach
an indicator of how much information that recommender has to work with
(number of occurrences of the recommended item might be good enough).  Then
do the same for some prominent characteristic of the items.  This might give
you a "brand" recommender for retail products or an "artist" recommender for
music.   For this more generic recommender, you might be able to directly
use the counts from the user's history.  Finally, build "top-40" models for
overall item, brand, artist or what have you characteristics.

Now train a simple model to combine these results to find items that the
user is likely to engage with.  SGD is an easy choice here.  At
recommendation time, you would run all of the constituent recommenders and
use the SGD model to rescore the union of their results.

If done well, the brand and top-40 models will give you decent cold start
behavior while the real collaborative filtering models will give you good
performance after the cold-start.  The SGD should be able to meld these
values well if it has a good indicator of how reliable each sub-model is.

On Mon, Feb 7, 2011 at 4:11 PM, Steven Bourke <sb...@gmail.com> wrote:

> Check http://www.springerlink.com/content/n881136032u8k111/ out. Do a
> search
> on google scholar and you might find  the pdf.
>
> What type of data / recommendations are you trying to make? Standard
> collaborative filtering techniques arent a bad thing.
>
> On Tue, Feb 8, 2011 at 12:05 AM, Chris Schilling <ch...@cellixis.com>
> wrote:
>
> > I am interested in this problem as well (combining content similarity
> with
> > CF).
> >
> > I want to build a system which makes use of the CF part of Mahout:  I am
> > recommending products to users.  Along with user ratings/preferences for
> > products, I also have a content based similarity metric calculated for
> each
> > item-item pair.
> >
> > I do not have a lot of experience in producing "hybrid" recommendations.
> >  Do you generally think the most appropriate thing to do is to boost
> > recommendations from CF?  Or do you like the 2nd method of using a custom
> > item similarity to combine cf similarity with content similarity?  It
> seems
> > straight forward enough to try both, just trying to get a feel for how to
> > approach this.
> >
> > Can you recommend any papers describing combination of content and CF?
> >
> > Thanks for your help!
> > Chris S.
> >
> > On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:
> >
> > > Hi Alexandre,
> > >
> > > I dont think there is "one golden way" but I can give you some hints
> > where to start regarding itembased recommenders. I think there are three
> > points where you could customize the behavior to enable "hybrid"
> > recommendations:
> > >
> > > * you can use a custom Rescorer to either filter the resulting
> > recommended items (e.g. restrict the result to a certain type/category of
> > items) or to boost some of them (e.g. by looking at their content)
> > >
> > > * you can use a custom ItemSimilarity which could compute a blended
> score
> > by combining the usual similarity score with an additional contentbased
> > similarity score
> > >
> > > * as collaborative filtering usually suffers from the "cold-start
> > problem" (you cannot make any assumptions about new users or items until
> > you've seen some interactions), you could work around this by
> implementing a
> > custom CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy that
> > uses content properties to find items to recommend if the user or the
> item
> > is new
> > >
> > >
> > > --sebastian
> > >
> > > On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
> > >> Hello Mahouters out there!
> > >>
> > >> I'm diving into the amazing world of Mahout and Hadoop and I have some
> > >> questions about it. My project consists in developing a recommender
> > system
> > >> for TV shows, and my objective is to study how can I ensemble/mix some
> > >> approaches, like content-based and collaborative filtering (with
> weights
> > for
> > >> example). Is there _the way_ to do it using Mahout, or it's an
> > unexplored
> > >> subject at the moment?
> > >>
> > >> Thanks in advance!
> > >> --
> > >> Alexandre Rodrigues
> > >>
> > >
> >
> >
>

Re: Hybrid RecSys — ways to do it

Posted by Steven Bourke <sb...@gmail.com>.

Check http://www.springerlink.com/content/n881136032u8k111/ out. Do a search
on google scholar and you might find  the pdf.

What type of data / recommendations are you trying to make? Standard
collaborative filtering techniques arent a bad thing.

On Tue, Feb 8, 2011 at 12:05 AM, Chris Schilling <ch...@cellixis.com> wrote:

> I am interested in this problem as well (combining content similarity with
> CF).
>
> I want to build a system which makes use of the CF part of Mahout:  I am
> recommending products to users.  Along with user ratings/preferences for
> products, I also have a content based similarity metric calculated for each
> item-item pair.
>
> I do not have a lot of experience in producing "hybrid" recommendations.
>  Do you generally think the most appropriate thing to do is to boost
> recommendations from CF?  Or do you like the 2nd method of using a custom
> item similarity to combine cf similarity with content similarity?  It seems
> straight forward enough to try both, just trying to get a feel for how to
> approach this.
>
> Can you recommend any papers describing combination of content and CF?
>
> Thanks for your help!
> Chris S.
>
> On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:
>
> > Hi Alexandre,
> >
> > I dont think there is "one golden way" but I can give you some hints
> where to start regarding itembased recommenders. I think there are three
> points where you could customize the behavior to enable "hybrid"
> recommendations:
> >
> > * you can use a custom Rescorer to either filter the resulting
> recommended items (e.g. restrict the result to a certain type/category of
> items) or to boost some of them (e.g. by looking at their content)
> >
> > * you can use a custom ItemSimilarity which could compute a blended score
> by combining the usual similarity score with an additional contentbased
> similarity score
> >
> > * as collaborative filtering usually suffers from the "cold-start
> problem" (you cannot make any assumptions about new users or items until
> you've seen some interactions), you could work around this by implementing a
> custom CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy that
> uses content properties to find items to recommend if the user or the item
> is new
> >
> >
> > --sebastian
> >
> > On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
> >> Hello Mahouters out there!
> >>
> >> I'm diving into the amazing world of Mahout and Hadoop and I have some
> >> questions about it. My project consists in developing a recommender
> system
> >> for TV shows, and my objective is to study how can I ensemble/mix some
> >> approaches, like content-based and collaborative filtering (with weights
> for
> >> example). Is there _the way_ to do it using Mahout, or it's an
> unexplored
> >> subject at the moment?
> >>
> >> Thanks in advance!
> >> --
> >> Alexandre Rodrigues
> >>
> >
>
>

Re: Hybrid RecSys — ways to do it

Posted by Chris Schilling <ch...@cellixis.com>.

I am interested in this problem as well (combining content similarity with CF).  

I want to build a system which makes use of the CF part of Mahout:  I am recommending products to users.  Along with user ratings/preferences for products, I also have a content based similarity metric calculated for each item-item pair.  

I do not have a lot of experience in producing "hybrid" recommendations.  Do you generally think the most appropriate thing to do is to boost recommendations from CF?  Or do you like the 2nd method of using a custom item similarity to combine cf similarity with content similarity?  It seems straight forward enough to try both, just trying to get a feel for how to approach this.

Can you recommend any papers describing combination of content and CF?

Thanks for your help!
Chris S.

On Feb 7, 2011, at 9:50 AM, Sebastian Schelter wrote:

> Hi Alexandre,
> 
> I dont think there is "one golden way" but I can give you some hints where to start regarding itembased recommenders. I think there are three points where you could customize the behavior to enable "hybrid" recommendations:
> 
> * you can use a custom Rescorer to either filter the resulting recommended items (e.g. restrict the result to a certain type/category of items) or to boost some of them (e.g. by looking at their content)
> 
> * you can use a custom ItemSimilarity which could compute a blended score by combining the usual similarity score with an additional contentbased similarity score
> 
> * as collaborative filtering usually suffers from the "cold-start problem" (you cannot make any assumptions about new users or items until you've seen some interactions), you could work around this by implementing a custom CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy that uses content properties to find items to recommend if the user or the item is new
> 
> 
> --sebastian
> 
> On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
>> Hello Mahouters out there!
>> 
>> I'm diving into the amazing world of Mahout and Hadoop and I have some
>> questions about it. My project consists in developing a recommender system
>> for TV shows, and my objective is to study how can I ensemble/mix some
>> approaches, like content-based and collaborative filtering (with weights for
>> example). Is there _the way_ to do it using Mahout, or it's an unexplored
>> subject at the moment?
>> 
>> Thanks in advance!
>> --
>> Alexandre Rodrigues
>> 
>

Re: Hybrid RecSys — ways to do it

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Alexandre,

I dont think there is "one golden way" but I can give you some hints 
where to start regarding itembased recommenders. I think there are three 
points where you could customize the behavior to enable "hybrid" 
recommendations:

* you can use a custom Rescorer to either filter the resulting 
recommended items (e.g. restrict the result to a certain type/category 
of items) or to boost some of them (e.g. by looking at their content)

* you can use a custom ItemSimilarity which could compute a blended 
score by combining the usual similarity score with an additional 
contentbased similarity score

* as collaborative filtering usually suffers from the "cold-start 
problem" (you cannot make any assumptions about new users or items until 
you've seen some interactions), you could work around this by 
implementing a custom 
CandidateItemsStrategy/MostSimilarItemsCandidateItemsStrategy that uses 
content properties to find items to recommend if the user or the item is 
new


--sebastian

On 07.02.2011 16:56, Alexandre Rodrigues (FEUP) wrote:
> Hello Mahouters out there!
>
> I'm diving into the amazing world of Mahout and Hadoop and I have some
> questions about it. My project consists in developing a recommender system
> for TV shows, and my objective is to study how can I ensemble/mix some
> approaches, like content-based and collaborative filtering (with weights for
> example). Is there _the way_ to do it using Mahout, or it's an unexplored
> subject at the moment?
>
> Thanks in advance!
> --
> Alexandre Rodrigues
>