You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Marko Ciric <ci...@gmail.com> on 2011/06/22 00:34:24 UTC

Which is more effective?

Hi guys,

When trying to do a content-based recommender, there could be two approaches
with Apache Mahout:

   - Having a custom implemented Taste ItemSimilarity that is calculated
   with item features.
   - Classifying a data set with Mahout by representing items with vectors.

Has anybody had the experience with comparing performance/accuracy of those?

Thanks

--
Marko Ćirić
ciric.marko@gmail.com

Re: Which is more effective?

Posted by Marko Ciric <ci...@gmail.com>.

Thanks guys.
Sean: I agree that a clustering algorithm is reasonable choice to calculate
a distance between items and can be used more directly.

On 22 June 2011 01:24, Chris Schilling <ch...@cellixis.com> wrote:

> Thanks Ted,
>
> Ill read through that...
>
>
> On Jun 21, 2011, at 4:17 PM, Ted Dunning wrote:
>
> > Chapter 17 in MiA has a decent description of this method.
> >
> > On Wed, Jun 22, 2011 at 1:17 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> >> You are right that sounds crazy.
> >>
> >> What I did was to model the target variable click trying to predict it
> with
> >> user features, item features and user x item interaction features.
> >>
> >>
> >> On Wed, Jun 22, 2011 at 1:10 AM, Chris Schilling <chris@cellixis.com
> >wrote:
> >>
> >>> Hey Ted,
> >>>
> >>> I was wondering if you could briefly describe how one would make
> content
> >>> based recommendations using the SGD classifiers.
> >>>
> >>> Say I have item1: feature1a, feature1b, feature1c
> >>> and             item2: feature2b, feature2c
> >>>
> >>> So, are you training a classifier for n labels, where n is the number
> of
> >>> items?  That seems crazy cause you only have one feature vector per
> item.
> >>>
> >>>
> >>> On Jun 21, 2011, at 3:49 PM, Ted Dunning wrote:
> >>>
> >>>> I have used the SGD classifiers for content based recommendation.  It
> >>> works
> >>>> out reasonably but the interaction variables can get kind of
> expensive.
> >>>>
> >>>> Doing it again, I think I would use latent factor log linear models to
> >>> do
> >>>> the interaction features.  See
> >>>> http://cseweb.ucsd.edu/~akmenon/LFL-ICDM10.pdf
> >>>>
> >>>> We have a half done implementation in Mahout.  There was a student at
> >>> UCSD
> >>>> looking into completing it, but we don't have real results yet.
> >>>>
> >>>> On Wed, Jun 22, 2011 at 12:34 AM, Marko Ciric <ci...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Hi guys,
> >>>>>
> >>>>> When trying to do a content-based recommender, there could be two
> >>>>> approaches
> >>>>> with Apache Mahout:
> >>>>>
> >>>>> - Having a custom implemented Taste ItemSimilarity that is calculated
> >>>>> with item features.
> >>>>> - Classifying a data set with Mahout by representing items with
> >>> vectors.
> >>>>>
> >>>>> Has anybody had the experience with comparing performance/accuracy of
> >>>>> those?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> --
> >>>>> Marko Ćirić
> >>>>> ciric.marko@gmail.com
> >>>>>
> >>>
> >>>
> >>
>
>


-- 
--
Marko Ćirić
ciric.marko@gmail.com

Re: Which is more effective?

Posted by Chris Schilling <ch...@cellixis.com>.

Thanks Ted,

Ill read through that...


On Jun 21, 2011, at 4:17 PM, Ted Dunning wrote:

> Chapter 17 in MiA has a decent description of this method.
> 
> On Wed, Jun 22, 2011 at 1:17 AM, Ted Dunning <te...@gmail.com> wrote:
> 
>> You are right that sounds crazy.
>> 
>> What I did was to model the target variable click trying to predict it with
>> user features, item features and user x item interaction features.
>> 
>> 
>> On Wed, Jun 22, 2011 at 1:10 AM, Chris Schilling <ch...@cellixis.com>wrote:
>> 
>>> Hey Ted,
>>> 
>>> I was wondering if you could briefly describe how one would make content
>>> based recommendations using the SGD classifiers.
>>> 
>>> Say I have item1: feature1a, feature1b, feature1c
>>> and             item2: feature2b, feature2c
>>> 
>>> So, are you training a classifier for n labels, where n is the number of
>>> items?  That seems crazy cause you only have one feature vector per item.
>>> 
>>> 
>>> On Jun 21, 2011, at 3:49 PM, Ted Dunning wrote:
>>> 
>>>> I have used the SGD classifiers for content based recommendation.  It
>>> works
>>>> out reasonably but the interaction variables can get kind of expensive.
>>>> 
>>>> Doing it again, I think I would use latent factor log linear models to
>>> do
>>>> the interaction features.  See
>>>> http://cseweb.ucsd.edu/~akmenon/LFL-ICDM10.pdf
>>>> 
>>>> We have a half done implementation in Mahout.  There was a student at
>>> UCSD
>>>> looking into completing it, but we don't have real results yet.
>>>> 
>>>> On Wed, Jun 22, 2011 at 12:34 AM, Marko Ciric <ci...@gmail.com>
>>> wrote:
>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> When trying to do a content-based recommender, there could be two
>>>>> approaches
>>>>> with Apache Mahout:
>>>>> 
>>>>> - Having a custom implemented Taste ItemSimilarity that is calculated
>>>>> with item features.
>>>>> - Classifying a data set with Mahout by representing items with
>>> vectors.
>>>>> 
>>>>> Has anybody had the experience with comparing performance/accuracy of
>>>>> those?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> --
>>>>> Marko Ćirić
>>>>> ciric.marko@gmail.com
>>>>> 
>>> 
>>> 
>>

Re: Which is more effective?

Posted by Ted Dunning <te...@gmail.com>.

Chapter 17 in MiA has a decent description of this method.

On Wed, Jun 22, 2011 at 1:17 AM, Ted Dunning <te...@gmail.com> wrote:

> You are right that sounds crazy.
>
> What I did was to model the target variable click trying to predict it with
> user features, item features and user x item interaction features.
>
>
> On Wed, Jun 22, 2011 at 1:10 AM, Chris Schilling <ch...@cellixis.com>wrote:
>
>> Hey Ted,
>>
>> I was wondering if you could briefly describe how one would make content
>> based recommendations using the SGD classifiers.
>>
>> Say I have item1: feature1a, feature1b, feature1c
>> and             item2: feature2b, feature2c
>>
>> So, are you training a classifier for n labels, where n is the number of
>> items?  That seems crazy cause you only have one feature vector per item.
>>
>>
>> On Jun 21, 2011, at 3:49 PM, Ted Dunning wrote:
>>
>> > I have used the SGD classifiers for content based recommendation.  It
>> works
>> > out reasonably but the interaction variables can get kind of expensive.
>> >
>> > Doing it again, I think I would use latent factor log linear models to
>> do
>> > the interaction features.  See
>> > http://cseweb.ucsd.edu/~akmenon/LFL-ICDM10.pdf
>> >
>> > We have a half done implementation in Mahout.  There was a student at
>> UCSD
>> > looking into completing it, but we don't have real results yet.
>> >
>> > On Wed, Jun 22, 2011 at 12:34 AM, Marko Ciric <ci...@gmail.com>
>> wrote:
>> >
>> >> Hi guys,
>> >>
>> >> When trying to do a content-based recommender, there could be two
>> >> approaches
>> >> with Apache Mahout:
>> >>
>> >>  - Having a custom implemented Taste ItemSimilarity that is calculated
>> >>  with item features.
>> >>  - Classifying a data set with Mahout by representing items with
>> vectors.
>> >>
>> >> Has anybody had the experience with comparing performance/accuracy of
>> >> those?
>> >>
>> >> Thanks
>> >>
>> >> --
>> >> Marko Ćirić
>> >> ciric.marko@gmail.com
>> >>
>>
>>
>

Re: Which is more effective?

Posted by Ted Dunning <te...@gmail.com>.

You are right that sounds crazy.

What I did was to model the target variable click trying to predict it with
user features, item features and user x item interaction features.

On Wed, Jun 22, 2011 at 1:10 AM, Chris Schilling <ch...@cellixis.com> wrote:

> Hey Ted,
>
> I was wondering if you could briefly describe how one would make content
> based recommendations using the SGD classifiers.
>
> Say I have item1: feature1a, feature1b, feature1c
> and             item2: feature2b, feature2c
>
> So, are you training a classifier for n labels, where n is the number of
> items?  That seems crazy cause you only have one feature vector per item.
>
>
> On Jun 21, 2011, at 3:49 PM, Ted Dunning wrote:
>
> > I have used the SGD classifiers for content based recommendation.  It
> works
> > out reasonably but the interaction variables can get kind of expensive.
> >
> > Doing it again, I think I would use latent factor log linear models to do
> > the interaction features.  See
> > http://cseweb.ucsd.edu/~akmenon/LFL-ICDM10.pdf
> >
> > We have a half done implementation in Mahout.  There was a student at
> UCSD
> > looking into completing it, but we don't have real results yet.
> >
> > On Wed, Jun 22, 2011 at 12:34 AM, Marko Ciric <ci...@gmail.com>
> wrote:
> >
> >> Hi guys,
> >>
> >> When trying to do a content-based recommender, there could be two
> >> approaches
> >> with Apache Mahout:
> >>
> >>  - Having a custom implemented Taste ItemSimilarity that is calculated
> >>  with item features.
> >>  - Classifying a data set with Mahout by representing items with
> vectors.
> >>
> >> Has anybody had the experience with comparing performance/accuracy of
> >> those?
> >>
> >> Thanks
> >>
> >> --
> >> Marko Ćirić
> >> ciric.marko@gmail.com
> >>
>
>

Re: Which is more effective?

Posted by Chris Schilling <ch...@cellixis.com>.

Hey Ted,

I was wondering if you could briefly describe how one would make content based recommendations using the SGD classifiers.  

Say I have item1: feature1a, feature1b, feature1c
and             item2: feature2b, feature2c

So, are you training a classifier for n labels, where n is the number of items?  That seems crazy cause you only have one feature vector per item.  


On Jun 21, 2011, at 3:49 PM, Ted Dunning wrote:

> I have used the SGD classifiers for content based recommendation.  It works
> out reasonably but the interaction variables can get kind of expensive.
> 
> Doing it again, I think I would use latent factor log linear models to do
> the interaction features.  See
> http://cseweb.ucsd.edu/~akmenon/LFL-ICDM10.pdf
> 
> We have a half done implementation in Mahout.  There was a student at UCSD
> looking into completing it, but we don't have real results yet.
> 
> On Wed, Jun 22, 2011 at 12:34 AM, Marko Ciric <ci...@gmail.com> wrote:
> 
>> Hi guys,
>> 
>> When trying to do a content-based recommender, there could be two
>> approaches
>> with Apache Mahout:
>> 
>>  - Having a custom implemented Taste ItemSimilarity that is calculated
>>  with item features.
>>  - Classifying a data set with Mahout by representing items with vectors.
>> 
>> Has anybody had the experience with comparing performance/accuracy of
>> those?
>> 
>> Thanks
>> 
>> --
>> Marko Ćirić
>> ciric.marko@gmail.com
>>

Re: Which is more effective?

Posted by Ted Dunning <te...@gmail.com>.

I have used the SGD classifiers for content based recommendation.  It works
out reasonably but the interaction variables can get kind of expensive.

Doing it again, I think I would use latent factor log linear models to do
the interaction features.  See
http://cseweb.ucsd.edu/~akmenon/LFL-ICDM10.pdf

We have a half done implementation in Mahout.  There was a student at UCSD
looking into completing it, but we don't have real results yet.

On Wed, Jun 22, 2011 at 12:34 AM, Marko Ciric <ci...@gmail.com> wrote:

> Hi guys,
>
> When trying to do a content-based recommender, there could be two
> approaches
> with Apache Mahout:
>
>   - Having a custom implemented Taste ItemSimilarity that is calculated
>   with item features.
>   - Classifying a data set with Mahout by representing items with vectors.
>
> Has anybody had the experience with comparing performance/accuracy of
> those?
>
> Thanks
>
> --
> Marko Ćirić
> ciric.marko@gmail.com
>

Re: Which is more effective?

Posted by Ted Dunning <te...@gmail.com>.

Actually, I should mention that I have done user-feature recommendations and
then (mis) used text retrieval to pull back items that have features as
text.  This works reasonably well and is pretty easy to do.  You will have
to watch out for very common features.

On Wed, Jun 22, 2011 at 12:50 AM, Sean Owen <sr...@gmail.com> wrote:

> For #1 -- there is still an unanswered issue in there, and that's how you
> extract features from items. I assume you already have some scheme for
> that.
> (The framework can't quite help you there.)
>
> But then the question remains how you compute similarity from features. So
> #1 isn't a concrete possibility by itself.
>
> I am not sure if a classifier can be used very directly to figure a notion
> of item-item similarity. I am sure it can be used in some sense, but it
> doesn't seem like the most direct tool. A simpler notion, of similarity or
> distance, is what you want, and that's a piece of clustering algorithms
> really.
>
> On Tue, Jun 21, 2011 at 11:34 PM, Marko Ciric <ci...@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > When trying to do a content-based recommender, there could be two
> > approaches
> > with Apache Mahout:
> >
> >   - Having a custom implemented Taste ItemSimilarity that is calculated
> >   with item features.
> >   - Classifying a data set with Mahout by representing items with
> vectors.
> >
> > Has anybody had the experience with comparing performance/accuracy of
> > those?
> >
> > Thanks
> >
> > --
> > Marko Ćirić
> > ciric.marko@gmail.com
> >
>

Re: Which is more effective?

Posted by Sean Owen <sr...@gmail.com>.

For #1 -- there is still an unanswered issue in there, and that's how you
extract features from items. I assume you already have some scheme for that.
(The framework can't quite help you there.)

But then the question remains how you compute similarity from features. So
#1 isn't a concrete possibility by itself.

I am not sure if a classifier can be used very directly to figure a notion
of item-item similarity. I am sure it can be used in some sense, but it
doesn't seem like the most direct tool. A simpler notion, of similarity or
distance, is what you want, and that's a piece of clustering algorithms
really.

On Tue, Jun 21, 2011 at 11:34 PM, Marko Ciric <ci...@gmail.com> wrote:

> Hi guys,
>
> When trying to do a content-based recommender, there could be two
> approaches
> with Apache Mahout:
>
>   - Having a custom implemented Taste ItemSimilarity that is calculated
>   with item features.
>   - Classifying a data set with Mahout by representing items with vectors.
>
> Has anybody had the experience with comparing performance/accuracy of
> those?
>
> Thanks
>
> --
> Marko Ćirić
> ciric.marko@gmail.com
>