You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2011/07/26 22:49:25 UTC

Re: Advice request

+Mahout user mailing list

On Tue, Jul 26, 2011 at 12:38 PM, Srinivas Kasturi
<sr...@gmail.com>wrote:

> ... I came across your blog entry on surprise and coincidence, and wondered
> if you can help me navigate what seems to be a confusing world of
> recommendation algorithms. The problem statement is this:
>
> 1. I have information at a user level in the form of a tag cloud: Words
> they have used and liked, along with a count of the frequency of incidence.
>

Excellent.  This is a user x word matrix.

> 2. I would like to use this information to run through a set of around 20
> million product pages, and suggest to them the top 100 that they are most
> likely to enjoy.
>

There are several ways to do this.

One simple way is to use a binary recommender to recommend words to the user
and then submit the resulting (long-ish) query to a search engine.  You
might pick a related subset of the  recommended words as the query in order
to get a shorter and more focused query.

This, in some way, is the surprise and coincidence problem, isn't it?
>

Yes.  It is!

> I am hoping to use one of the Mahout algorithms (
> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms), but can't,
> for the life of me, figure out which one is the closest fit.
>

Firstly there are programs that look at something like your user x term data
to find terms that occur anomalously often.  Secondly, there are
recommendation systems that would let you recommend additional words to the
user.

I am sure that others will have other suggestions as well.

Re: Advice request

Posted by Marko Ciric <ci...@gmail.com>.

You could also introduce clustering and build clusters from pages that have
a lot of similar words. If your pages data doesn't change too often, you
could select most similar pages from within a cluster and recommend it to a
user..
On Aug 8, 2011 6:08 PM, "Marko Ciric" <ci...@gmail.com> wrote:
> You might want to use TanimotoCoefficientSimilarity if your data set isn't
> large.
> On Jul 27, 2011 10:51 AM, "Sean Owen" <sr...@gmail.com> wrote:
>> Sounds good. In that case, the surprise-n-coincidence counterpart you are
>> probably looking for it LogLikelihoodSimilarity, which implements
>> ItemSimilarity. Use it with a GenericBooleanPrefItemBasedRecommender and
> you
>> can recommend new words to use.
>>
>> On Wed, Jul 27, 2011 at 9:01 AM, Ted Dunning <te...@gmail.com>
> wrote:
>>
>>> Actually, I think that recommending words to people and then doing the
>>> search may add some mileage.
>>>
>>> On Wed, Jul 27, 2011 at 12:38 AM, Sean Owen <sr...@gmail.com> wrote:
>>>
>>> > It's just a search problem as Ted says -- minus
>>> > even the recommendation phase.
>>> >
>>> > Is that all you want? then try Lucene, probably.
>>> >
>>>

Re: Advice request

Posted by Marko Ciric <ci...@gmail.com>.

You might want to use TanimotoCoefficientSimilarity if your data set isn't
large.
On Jul 27, 2011 10:51 AM, "Sean Owen" <sr...@gmail.com> wrote:
> Sounds good. In that case, the surprise-n-coincidence counterpart you are
> probably looking for it LogLikelihoodSimilarity, which implements
> ItemSimilarity. Use it with a GenericBooleanPrefItemBasedRecommender and
you
> can recommend new words to use.
>
> On Wed, Jul 27, 2011 at 9:01 AM, Ted Dunning <te...@gmail.com>
wrote:
>
>> Actually, I think that recommending words to people and then doing the
>> search may add some mileage.
>>
>> On Wed, Jul 27, 2011 at 12:38 AM, Sean Owen <sr...@gmail.com> wrote:
>>
>> > It's just a search problem as Ted says -- minus
>> > even the recommendation phase.
>> >
>> > Is that all you want? then try Lucene, probably.
>> >
>>

Re: Advice request

Posted by Sean Owen <sr...@gmail.com>.

Sounds good. In that case, the surprise-n-coincidence counterpart you are
probably looking for it LogLikelihoodSimilarity, which implements
ItemSimilarity. Use it with a GenericBooleanPrefItemBasedRecommender and you
can recommend new words to use.

On Wed, Jul 27, 2011 at 9:01 AM, Ted Dunning <te...@gmail.com> wrote:

> Actually, I think that recommending words to people and then doing the
> search may add some mileage.
>
> On Wed, Jul 27, 2011 at 12:38 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > It's just a search problem as Ted says -- minus
> > even the recommendation phase.
> >
> > Is that all you want? then try Lucene, probably.
> >
>

Re: Advice request

Posted by Srinivas Kasturi <sr...@gmail.com>.

Ted, i agree, that's what I think will help discovery.

Sent from my Cyanogen Mod 7 Gingerbread on HD2
On Jul 27, 2011 9:01 AM, "Ted Dunning" <te...@gmail.com> wrote:
> Actually, I think that recommending words to people and then doing the
> search may add some mileage.
>
> On Wed, Jul 27, 2011 at 12:38 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> It's just a search problem as Ted says -- minus
>> even the recommendation phase.
>>
>> Is that all you want? then try Lucene, probably.
>>

Re: Advice request

Posted by Ted Dunning <te...@gmail.com>.

Actually, I think that recommending words to people and then doing the
search may add some mileage.

On Wed, Jul 27, 2011 at 12:38 AM, Sean Owen <sr...@gmail.com> wrote:

> It's just a search problem as Ted says -- minus
> even the recommendation phase.
>
> Is that all you want? then try Lucene, probably.
>

Re: Advice request

Posted by Sean Owen <sr...@gmail.com>.

At first glance, it doesn't seem like a recommender problem. You know
which words the user uses frequently, and you know which terms
describe products. It's just a search problem as Ted says -- minus
even the recommendation phase.

Is that all you want? then try Lucene, probably.

Or is it something different?

On Tue, Jul 26, 2011 at 9:49 PM, Ted Dunning <te...@gmail.com> wrote:
> +Mahout user mailing list
>
> On Tue, Jul 26, 2011 at 12:38 PM, Srinivas Kasturi
> <sr...@gmail.com>wrote:
>
>> ... I came across your blog entry on surprise and coincidence, and wondered
>> if you can help me navigate what seems to be a confusing world of
>> recommendation algorithms. The problem statement is this:
>>
>> 1. I have information at a user level in the form of a tag cloud: Words
>> they have used and liked, along with a count of the frequency of incidence.
>>
>
> Excellent.  This is a user x word matrix.
>
>
>> 2. I would like to use this information to run through a set of around 20
>> million product pages, and suggest to them the top 100 that they are most
>> likely to enjoy.
>>
>
> There are several ways to do this.
>
> One simple way is to use a binary recommender to recommend words to the user
> and then submit the resulting (long-ish) query to a search engine.  You
> might pick a related subset of the  recommended words as the query in order
> to get a shorter and more focused query.
>
> This, in some way, is the surprise and coincidence problem, isn't it?
>>
>
> Yes.  It is!
>
>
>> I am hoping to use one of the Mahout algorithms (
>> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms), but can't,
>> for the life of me, figure out which one is the closest fit.
>>
>
> Firstly there are programs that look at something like your user x term data
> to find terms that occur anomalously often.  Secondly, there are
> recommendation systems that would let you recommend additional words to the
> user.
>
> I am sure that others will have other suggestions as well.
>