You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Nick Pentreath <ni...@gmail.com> on 2013/09/05 10:07:48 UTC

Tweaking ALS models to filter out "highly related" items when an item has been purchased

Hi all

Say I have a set of ecommerce data (views, purchases etc). I've built my
model using implicit feedback ALS. Now, I want to add a little bit of
"smart filtering".

Filtering based on not recommending something that has been purchased is
straightforward, but I'd like to also filter so as not to recommend "highly
similar" items to someone who has purchased an item.

In other words, if someone has just purchased a laptop, then I'd like to
not recommend other laptops. Ideally while still recommending "related"
items such as laptop bags, mouse etc etc. (this is just an example).

Now, I could filter based on metadata tags like "category", but assuming I
don't always have that data, then simplistically I have the option of
filtering out products based on those that have high cosine similarity to
the purchased products. However, this risks filtering out "good" similar
products (like the laptop bags) as well as the "bad" similar products.

I'm experimenting with building a second variant of the model that
effectively downweights "views" to near zero, hence leaving something sort
of like a "purchased together" model variant. Then recommendations can be
made using this model when a user purchases an item (or perhaps a re-scorer
that is a weighted variant of model A and model B but that tends to weight
model B - the purchased together model - higher)

Are there other mechanisms to tweak the ALS model such that it tends
towards recommending "related products" (but not "highly similar of the
exact same narrow product type")?

Any other ideas about how best to go about this?

Many thanks
Nick

Re: Tweaking ALS models to filter out "highly related" items when an item has been purchased

Posted by Nick Pentreath <ni...@gmail.com>.

Thanks for the comments - all useful. Seems as always a bit of experimentation is in order to try the view-vs-purchase filtering, vs heuristic post reordering, vs potentially some metadata-based approach.

    
      
        


      One of our challenges is we are indeed trying to generalise as much as possible since we have a "recommender as a a service" type offering. So catering to edge cases is indeed not the way to go. But potentially a heuristic-style approach that can be somewhat learned from data/recommender performance, vua split testing and offline testing, might be the way to go.

      
        


    

    —
Sent from Mailbox for iPhone

On Thu, Sep 5, 2013 at 8:53 PM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

> FWIW our marketing people call it "cross-sell" and "upsell". i.e.
> selling stuff from different categories vs. offering more behaviorally
>  similar items to currently browsed category optimized to speicifc
> target (revenue,sales event etc.) in either case, preexisting (or
> inferred from side data via clustering) labelling helps to discern
> between "upsell" and "cross-sell" scores.
> On Thu, Sep 5, 2013 at 11:22 AM, Dominik Hübner <co...@dhuebner.com> wrote:
>>> As far as implementation is concerned, I think that it is very important to
>>> not distort the basic recommendation algorithm with business rules like
>>> this.  It is much better to post-process the results to impose your will
>>> directly.  One exception to this is that I think it is reasonable to use
>>> ordered cooccurrence and also repeated cooccurrence here for some hints
>>> here.  This lets you determine likely accessories (purchased after the main
>>> item, mostly) and also find razor-blades (highly repetitive purchases).
>>> You still have the problem of flooding with similar items.
>>
>> +1 for keeping business rules out of your recommendations. I think integrating too many edge cases will never generalize for all users and debugging becomes nothing but a pain.
>>
>>> My approach in the past was to define heuristic definitions for "too
>>> similar" and do a pass over the sorted recommendation results giving each
>>> item that passes the too-similar criterion a penalty score.  When done with
>>> this, I re-sort the results and the duplicative content falls to the bottom
>>> of the recommendations.
>>>
>>
>> I recently was working on some recommendations for a fashion brand. Filtering too similar items was indeed crucial. I observed a common pattern of users viewing products only varying in their color or other "minor" features. I think it ultimately depends on the environment you are displaying your recommendations. If you actually try to show related products, those really similar items (like color variations) might not be the worst thing. Building some sort of product mash-up probably should be more diverse, just like Ted mentioned with flooding the first few pages. But …. there they are again, those edge-cases I mentioned. Pre-sale recommendations might be less diverse than after purchase recommendations. I just depends on the domain you are working in I guess.
>>
>> On Sep 5, 2013, at 7:38 PM, Ted Dunning <te...@gmail.com> wrote:
>>
>>> I think that Dominik's comments are exactly on target.
>>>
>>> As far as implementation is concerned, I think that it is very important to
>>> not distort the basic recommendation algorithm with business rules like
>>> this.  It is much better to post-process the results to impose your will
>>> directly.  One exception to this is that I think it is reasonable to use
>>> ordered cooccurrence and also repeated cooccurrence here for some hints
>>> here.  This lets you determine likely accessories (purchased after the main
>>> item, mostly) and also find razor-blades (highly repetitive purchases).
>>> You still have the problem of flooding with similar items.
>>>
>>> The diversity that you are talking about is a critical quality in
>>> recommendation results.  The basic intuition is that recommendation results
>>> are not individual recommendations, but are included in a portfolio of
>>> recommendations.  You need the diversity in this portfolio because if you
>>> are wrong about an item, the likelihood of being wrong about very similar
>>> items is high.  If you flood the first and second pages with these similar
>>> items, then you don't have room for the alternative items that might well
>>> be correct.
>>>
>>> My approach in the past was to define heuristic definitions for "too
>>> similar" and do a pass over the sorted recommendation results giving each
>>> item that passes the too-similar criterion a penalty score.  When done with
>>> this, I re-sort the results and the duplicative content falls to the bottom
>>> of the recommendations.
>>>
>>>
>>>
>>> On Thu, Sep 5, 2013 at 1:15 AM, Dominik Hübner <co...@dhuebner.com> wrote:
>>>
>>>> Just a quick a assumption, maybe I have not thought this through enough:
>>>>
>>>> 1. Users probably tend to compare products => similar VIEWS
>>>> 2. User as well might tend to PURCHASE accessory products, like the laptop
>>>> bag you mentioned
>>>>
>>>> May be you could filter out products that have a similarity computed from
>>>> the product views, but leave those similar, based on purchases, in your
>>>> recommendation set?
>>>>
>>>> Nevertheless, I guess this will be strongly depending on the domain the
>>>> data comes from.
>>>>
>>>>
>>>> On Sep 5, 2013, at 10:07 AM, Nick Pentreath <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all
>>>>>
>>>>> Say I have a set of ecommerce data (views, purchases etc). I've built my
>>>>> model using implicit feedback ALS. Now, I want to add a little bit of
>>>>> "smart filtering".
>>>>>
>>>>> Filtering based on not recommending something that has been purchased is
>>>>> straightforward, but I'd like to also filter so as not to recommend
>>>> "highly
>>>>> similar" items to someone who has purchased an item.
>>>>>
>>>>> In other words, if someone has just purchased a laptop, then I'd like to
>>>>> not recommend other laptops. Ideally while still recommending "related"
>>>>> items such as laptop bags, mouse etc etc. (this is just an example).
>>>>>
>>>>> Now, I could filter based on metadata tags like "category", but assuming
>>>> I
>>>>> don't always have that data, then simplistically I have the option of
>>>>> filtering out products based on those that have high cosine similarity to
>>>>> the purchased products. However, this risks filtering out "good" similar
>>>>> products (like the laptop bags) as well as the "bad" similar products.
>>>>>
>>>>> I'm experimenting with building a second variant of the model that
>>>>> effectively downweights "views" to near zero, hence leaving something
>>>> sort
>>>>> of like a "purchased together" model variant. Then recommendations can be
>>>>> made using this model when a user purchases an item (or perhaps a
>>>> re-scorer
>>>>> that is a weighted variant of model A and model B but that tends to
>>>> weight
>>>>> model B - the purchased together model - higher)
>>>>>
>>>>> Are there other mechanisms to tweak the ALS model such that it tends
>>>>> towards recommending "related products" (but not "highly similar of the
>>>>> exact same narrow product type")?
>>>>>
>>>>> Any other ideas about how best to go about this?
>>>>>
>>>>> Many thanks
>>>>> Nick
>>>>
>>>>
>>

Re: Tweaking ALS models to filter out "highly related" items when an item has been purchased

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

FWIW our marketing people call it "cross-sell" and "upsell". i.e.
selling stuff from different categories vs. offering more behaviorally
 similar items to currently browsed category optimized to speicifc
target (revenue,sales event etc.) in either case, preexisting (or
inferred from side data via clustering) labelling helps to discern
between "upsell" and "cross-sell" scores.

On Thu, Sep 5, 2013 at 11:22 AM, Dominik Hübner <co...@dhuebner.com> wrote:
>> As far as implementation is concerned, I think that it is very important to
>> not distort the basic recommendation algorithm with business rules like
>> this.  It is much better to post-process the results to impose your will
>> directly.  One exception to this is that I think it is reasonable to use
>> ordered cooccurrence and also repeated cooccurrence here for some hints
>> here.  This lets you determine likely accessories (purchased after the main
>> item, mostly) and also find razor-blades (highly repetitive purchases).
>> You still have the problem of flooding with similar items.
>
> +1 for keeping business rules out of your recommendations. I think integrating too many edge cases will never generalize for all users and debugging becomes nothing but a pain.
>
>> My approach in the past was to define heuristic definitions for "too
>> similar" and do a pass over the sorted recommendation results giving each
>> item that passes the too-similar criterion a penalty score.  When done with
>> this, I re-sort the results and the duplicative content falls to the bottom
>> of the recommendations.
>>
>
> I recently was working on some recommendations for a fashion brand. Filtering too similar items was indeed crucial. I observed a common pattern of users viewing products only varying in their color or other "minor" features. I think it ultimately depends on the environment you are displaying your recommendations. If you actually try to show related products, those really similar items (like color variations) might not be the worst thing. Building some sort of product mash-up probably should be more diverse, just like Ted mentioned with flooding the first few pages. But …. there they are again, those edge-cases I mentioned. Pre-sale recommendations might be less diverse than after purchase recommendations. I just depends on the domain you are working in I guess.
>
> On Sep 5, 2013, at 7:38 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> I think that Dominik's comments are exactly on target.
>>
>> As far as implementation is concerned, I think that it is very important to
>> not distort the basic recommendation algorithm with business rules like
>> this.  It is much better to post-process the results to impose your will
>> directly.  One exception to this is that I think it is reasonable to use
>> ordered cooccurrence and also repeated cooccurrence here for some hints
>> here.  This lets you determine likely accessories (purchased after the main
>> item, mostly) and also find razor-blades (highly repetitive purchases).
>> You still have the problem of flooding with similar items.
>>
>> The diversity that you are talking about is a critical quality in
>> recommendation results.  The basic intuition is that recommendation results
>> are not individual recommendations, but are included in a portfolio of
>> recommendations.  You need the diversity in this portfolio because if you
>> are wrong about an item, the likelihood of being wrong about very similar
>> items is high.  If you flood the first and second pages with these similar
>> items, then you don't have room for the alternative items that might well
>> be correct.
>>
>> My approach in the past was to define heuristic definitions for "too
>> similar" and do a pass over the sorted recommendation results giving each
>> item that passes the too-similar criterion a penalty score.  When done with
>> this, I re-sort the results and the duplicative content falls to the bottom
>> of the recommendations.
>>
>>
>>
>> On Thu, Sep 5, 2013 at 1:15 AM, Dominik Hübner <co...@dhuebner.com> wrote:
>>
>>> Just a quick a assumption, maybe I have not thought this through enough:
>>>
>>> 1. Users probably tend to compare products => similar VIEWS
>>> 2. User as well might tend to PURCHASE accessory products, like the laptop
>>> bag you mentioned
>>>
>>> May be you could filter out products that have a similarity computed from
>>> the product views, but leave those similar, based on purchases, in your
>>> recommendation set?
>>>
>>> Nevertheless, I guess this will be strongly depending on the domain the
>>> data comes from.
>>>
>>>
>>> On Sep 5, 2013, at 10:07 AM, Nick Pentreath <ni...@gmail.com>
>>> wrote:
>>>
>>>> Hi all
>>>>
>>>> Say I have a set of ecommerce data (views, purchases etc). I've built my
>>>> model using implicit feedback ALS. Now, I want to add a little bit of
>>>> "smart filtering".
>>>>
>>>> Filtering based on not recommending something that has been purchased is
>>>> straightforward, but I'd like to also filter so as not to recommend
>>> "highly
>>>> similar" items to someone who has purchased an item.
>>>>
>>>> In other words, if someone has just purchased a laptop, then I'd like to
>>>> not recommend other laptops. Ideally while still recommending "related"
>>>> items such as laptop bags, mouse etc etc. (this is just an example).
>>>>
>>>> Now, I could filter based on metadata tags like "category", but assuming
>>> I
>>>> don't always have that data, then simplistically I have the option of
>>>> filtering out products based on those that have high cosine similarity to
>>>> the purchased products. However, this risks filtering out "good" similar
>>>> products (like the laptop bags) as well as the "bad" similar products.
>>>>
>>>> I'm experimenting with building a second variant of the model that
>>>> effectively downweights "views" to near zero, hence leaving something
>>> sort
>>>> of like a "purchased together" model variant. Then recommendations can be
>>>> made using this model when a user purchases an item (or perhaps a
>>> re-scorer
>>>> that is a weighted variant of model A and model B but that tends to
>>> weight
>>>> model B - the purchased together model - higher)
>>>>
>>>> Are there other mechanisms to tweak the ALS model such that it tends
>>>> towards recommending "related products" (but not "highly similar of the
>>>> exact same narrow product type")?
>>>>
>>>> Any other ideas about how best to go about this?
>>>>
>>>> Many thanks
>>>> Nick
>>>
>>>
>

Re: Tweaking ALS models to filter out "highly related" items when an item has been purchased

Posted by Dominik Hübner <co...@dhuebner.com>.

> As far as implementation is concerned, I think that it is very important to
> not distort the basic recommendation algorithm with business rules like
> this.  It is much better to post-process the results to impose your will
> directly.  One exception to this is that I think it is reasonable to use
> ordered cooccurrence and also repeated cooccurrence here for some hints
> here.  This lets you determine likely accessories (purchased after the main
> item, mostly) and also find razor-blades (highly repetitive purchases).
> You still have the problem of flooding with similar items.

+1 for keeping business rules out of your recommendations. I think integrating too many edge cases will never generalize for all users and debugging becomes nothing but a pain. 

> My approach in the past was to define heuristic definitions for "too
> similar" and do a pass over the sorted recommendation results giving each
> item that passes the too-similar criterion a penalty score.  When done with
> this, I re-sort the results and the duplicative content falls to the bottom
> of the recommendations.
> 

I recently was working on some recommendations for a fashion brand. Filtering too similar items was indeed crucial. I observed a common pattern of users viewing products only varying in their color or other "minor" features. I think it ultimately depends on the environment you are displaying your recommendations. If you actually try to show related products, those really similar items (like color variations) might not be the worst thing. Building some sort of product mash-up probably should be more diverse, just like Ted mentioned with flooding the first few pages. But …. there they are again, those edge-cases I mentioned. Pre-sale recommendations might be less diverse than after purchase recommendations. I just depends on the domain you are working in I guess.

On Sep 5, 2013, at 7:38 PM, Ted Dunning <te...@gmail.com> wrote:

> I think that Dominik's comments are exactly on target.
> 
> As far as implementation is concerned, I think that it is very important to
> not distort the basic recommendation algorithm with business rules like
> this.  It is much better to post-process the results to impose your will
> directly.  One exception to this is that I think it is reasonable to use
> ordered cooccurrence and also repeated cooccurrence here for some hints
> here.  This lets you determine likely accessories (purchased after the main
> item, mostly) and also find razor-blades (highly repetitive purchases).
> You still have the problem of flooding with similar items.
> 
> The diversity that you are talking about is a critical quality in
> recommendation results.  The basic intuition is that recommendation results
> are not individual recommendations, but are included in a portfolio of
> recommendations.  You need the diversity in this portfolio because if you
> are wrong about an item, the likelihood of being wrong about very similar
> items is high.  If you flood the first and second pages with these similar
> items, then you don't have room for the alternative items that might well
> be correct.
> 
> My approach in the past was to define heuristic definitions for "too
> similar" and do a pass over the sorted recommendation results giving each
> item that passes the too-similar criterion a penalty score.  When done with
> this, I re-sort the results and the duplicative content falls to the bottom
> of the recommendations.
> 
> 
> 
> On Thu, Sep 5, 2013 at 1:15 AM, Dominik Hübner <co...@dhuebner.com> wrote:
> 
>> Just a quick a assumption, maybe I have not thought this through enough:
>> 
>> 1. Users probably tend to compare products => similar VIEWS
>> 2. User as well might tend to PURCHASE accessory products, like the laptop
>> bag you mentioned
>> 
>> May be you could filter out products that have a similarity computed from
>> the product views, but leave those similar, based on purchases, in your
>> recommendation set?
>> 
>> Nevertheless, I guess this will be strongly depending on the domain the
>> data comes from.
>> 
>> 
>> On Sep 5, 2013, at 10:07 AM, Nick Pentreath <ni...@gmail.com>
>> wrote:
>> 
>>> Hi all
>>> 
>>> Say I have a set of ecommerce data (views, purchases etc). I've built my
>>> model using implicit feedback ALS. Now, I want to add a little bit of
>>> "smart filtering".
>>> 
>>> Filtering based on not recommending something that has been purchased is
>>> straightforward, but I'd like to also filter so as not to recommend
>> "highly
>>> similar" items to someone who has purchased an item.
>>> 
>>> In other words, if someone has just purchased a laptop, then I'd like to
>>> not recommend other laptops. Ideally while still recommending "related"
>>> items such as laptop bags, mouse etc etc. (this is just an example).
>>> 
>>> Now, I could filter based on metadata tags like "category", but assuming
>> I
>>> don't always have that data, then simplistically I have the option of
>>> filtering out products based on those that have high cosine similarity to
>>> the purchased products. However, this risks filtering out "good" similar
>>> products (like the laptop bags) as well as the "bad" similar products.
>>> 
>>> I'm experimenting with building a second variant of the model that
>>> effectively downweights "views" to near zero, hence leaving something
>> sort
>>> of like a "purchased together" model variant. Then recommendations can be
>>> made using this model when a user purchases an item (or perhaps a
>> re-scorer
>>> that is a weighted variant of model A and model B but that tends to
>> weight
>>> model B - the purchased together model - higher)
>>> 
>>> Are there other mechanisms to tweak the ALS model such that it tends
>>> towards recommending "related products" (but not "highly similar of the
>>> exact same narrow product type")?
>>> 
>>> Any other ideas about how best to go about this?
>>> 
>>> Many thanks
>>> Nick
>> 
>>

Re: Tweaking ALS models to filter out "highly related" items when an item has been purchased

Posted by Ted Dunning <te...@gmail.com>.

I think that Dominik's comments are exactly on target.

As far as implementation is concerned, I think that it is very important to
not distort the basic recommendation algorithm with business rules like
this.  It is much better to post-process the results to impose your will
directly.  One exception to this is that I think it is reasonable to use
ordered cooccurrence and also repeated cooccurrence here for some hints
here.  This lets you determine likely accessories (purchased after the main
item, mostly) and also find razor-blades (highly repetitive purchases).
 You still have the problem of flooding with similar items.

The diversity that you are talking about is a critical quality in
recommendation results.  The basic intuition is that recommendation results
are not individual recommendations, but are included in a portfolio of
recommendations.  You need the diversity in this portfolio because if you
are wrong about an item, the likelihood of being wrong about very similar
items is high.  If you flood the first and second pages with these similar
items, then you don't have room for the alternative items that might well
be correct.

My approach in the past was to define heuristic definitions for "too
similar" and do a pass over the sorted recommendation results giving each
item that passes the too-similar criterion a penalty score.  When done with
this, I re-sort the results and the duplicative content falls to the bottom
of the recommendations.

On Thu, Sep 5, 2013 at 1:15 AM, Dominik Hübner <co...@dhuebner.com> wrote:

> Just a quick a assumption, maybe I have not thought this through enough:
>
> 1. Users probably tend to compare products => similar VIEWS
> 2. User as well might tend to PURCHASE accessory products, like the laptop
> bag you mentioned
>
> May be you could filter out products that have a similarity computed from
> the product views, but leave those similar, based on purchases, in your
> recommendation set?
>
> Nevertheless, I guess this will be strongly depending on the domain the
> data comes from.
>
>
> On Sep 5, 2013, at 10:07 AM, Nick Pentreath <ni...@gmail.com>
> wrote:
>
> > Hi all
> >
> > Say I have a set of ecommerce data (views, purchases etc). I've built my
> > model using implicit feedback ALS. Now, I want to add a little bit of
> > "smart filtering".
> >
> > Filtering based on not recommending something that has been purchased is
> > straightforward, but I'd like to also filter so as not to recommend
> "highly
> > similar" items to someone who has purchased an item.
> >
> > In other words, if someone has just purchased a laptop, then I'd like to
> > not recommend other laptops. Ideally while still recommending "related"
> > items such as laptop bags, mouse etc etc. (this is just an example).
> >
> > Now, I could filter based on metadata tags like "category", but assuming
> I
> > don't always have that data, then simplistically I have the option of
> > filtering out products based on those that have high cosine similarity to
> > the purchased products. However, this risks filtering out "good" similar
> > products (like the laptop bags) as well as the "bad" similar products.
> >
> > I'm experimenting with building a second variant of the model that
> > effectively downweights "views" to near zero, hence leaving something
> sort
> > of like a "purchased together" model variant. Then recommendations can be
> > made using this model when a user purchases an item (or perhaps a
> re-scorer
> > that is a weighted variant of model A and model B but that tends to
> weight
> > model B - the purchased together model - higher)
> >
> > Are there other mechanisms to tweak the ALS model such that it tends
> > towards recommending "related products" (but not "highly similar of the
> > exact same narrow product type")?
> >
> > Any other ideas about how best to go about this?
> >
> > Many thanks
> > Nick
>
>

Re: Tweaking ALS models to filter out "highly related" items when an item has been purchased

Posted by Dominik Hübner <co...@dhuebner.com>.

Just a quick a assumption, maybe I have not thought this through enough:

1. Users probably tend to compare products => similar VIEWS
2. User as well might tend to PURCHASE accessory products, like the laptop bag you mentioned

May be you could filter out products that have a similarity computed from the product views, but leave those similar, based on purchases, in your recommendation set?

Nevertheless, I guess this will be strongly depending on the domain the data comes from.


On Sep 5, 2013, at 10:07 AM, Nick Pentreath <ni...@gmail.com> wrote:

> Hi all
> 
> Say I have a set of ecommerce data (views, purchases etc). I've built my
> model using implicit feedback ALS. Now, I want to add a little bit of
> "smart filtering".
> 
> Filtering based on not recommending something that has been purchased is
> straightforward, but I'd like to also filter so as not to recommend "highly
> similar" items to someone who has purchased an item.
> 
> In other words, if someone has just purchased a laptop, then I'd like to
> not recommend other laptops. Ideally while still recommending "related"
> items such as laptop bags, mouse etc etc. (this is just an example).
> 
> Now, I could filter based on metadata tags like "category", but assuming I
> don't always have that data, then simplistically I have the option of
> filtering out products based on those that have high cosine similarity to
> the purchased products. However, this risks filtering out "good" similar
> products (like the laptop bags) as well as the "bad" similar products.
> 
> I'm experimenting with building a second variant of the model that
> effectively downweights "views" to near zero, hence leaving something sort
> of like a "purchased together" model variant. Then recommendations can be
> made using this model when a user purchases an item (or perhaps a re-scorer
> that is a weighted variant of model A and model B but that tends to weight
> model B - the purchased together model - higher)
> 
> Are there other mechanisms to tweak the ALS model such that it tends
> towards recommending "related products" (but not "highly similar of the
> exact same narrow product type")?
> 
> Any other ideas about how best to go about this?
> 
> Many thanks
> Nick