You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Rich <cc...@gmail.com> on 2012/03/14 00:53:43 UTC

The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Hi,
I have been digging into Mahout on Hadoop for the pas few days. 
I was wondering the recommendation 
algorithm that is used in RecommenderJob.java. For example: 

bin/hadoop jar /opt/mahout/core/target/mahout-core-0.7-SNAPSHOT-job.jar 
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob 
-Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output 
--usersFile input/users.txt --booleanData

By executing this command, is either item-based or user-based 
recommendation algorithm being used?
And does specifying "--similarityClassname" in the command 
have anything to do with 
choosing item-based or user-based algorithm for the recommendation?


The help is appreciated in advance,
Rich

how to package distance measure and analyzer

Posted by Pat Ferrel <pa...@occamsmachete.com>.

When using the mahout drivers for clustering or vector generation and 
supplying a distance measure class or analyzer class how should I 
package my code?

For instance if I want to have a custom analyzer and pass the classname 
to the seq2sparse driver do I have to build it into a custom job and 
execute that instead of the one supplied with mahout? Or can I pass the 
jar path in explicitly to job's generic parameters for classpath?

What is the recommended way to handle this?

RE: The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Posted by WangBin <co...@hotmail.com>.


Book Programming Collective Intelligence tell us Item-Based filtering is more efficient than User-Based,
although User-Based is easier to be implemented.

Bin

> To: user@mahout.apache.org
> From: cchuang411@gmail.com
> Subject: Re: The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> Date: Wed, 14 Mar 2012 04:20:38 +0000
> 
> Sean Owen <srowen <at> gmail.com> writes:
> 
> > 
> > Yes it's item-based only. --similarityClassname chooses the metric but
> > it is item-based.
> > 
> > On Tue, Mar 13, 2012 at 11:53 PM, Rich <cchuang411 <at> gmail.com> wrote:
> > > Hi,
> > > I have been digging into Mahout on Hadoop for the pas few days.
> > > I was wondering the recommendation
> > > algorithm that is used in RecommenderJob.java. For example:
> > >
> > > bin/hadoop jar /opt/mahout/core/target/mahout-core-0.7-SNAPSHOT-job.jar
> > > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> > > -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output
> > > --usersFile input/users.txt --booleanData
> > >
> > > By executing this command, is either item-based or user-based
> > > recommendation algorithm being used?
> > > And does specifying "--similarityClassname" in the command
> > > have anything to do with
> > > choosing item-based or user-based algorithm for the recommendation?
> > >
> > >
> > > The help is appreciated in advance,
> > > Rich
> > >
> > 
> > 
> 
> 
> Thanks for your reply, Sean.
> 
> Do you have any idea whether user-based recommendation could be implemented for
> Mahout running on Hadoop? Is it possible with the current framework? For
> example, like the chapter 6's sample codes in Mahout in Action, is it possible
> to write custom Mapper/Reducer classes to achieve the goal, providing user-based
> recommendation?
> 
> Thanks,
> Rich 
> 
>

Re: The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Posted by Revoti <re...@gmail.com>.

Hi Sebastian,

Not only do we have to calculate the similarity of new users in user based CF
but also the similarity values of old users repeatedly. This happens because
User Profiles are dynamic and hence they change with time. This leads to change
in similarity values with other users.

----Revoti

Re: The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Rich,

It would be very easy to implement user-based collaborative filtering in
Mahout. At the core of the distributed recommender module is
RowSimilarityJob which computes a matrix of pairwise similarities. We
make it compute the similarities between item vectors, but it could
easily be fed with user vectors.

However, item-based collaborative filtering has a lot of advantages:

- there are usually much more users than items, so computing all user
similarities is more expensive
- you don't have to recompute the similarities for new users
- its prediction quality is said to be equal or superior to the
user-based variant

--sebastian


On 14.03.2012 05:20, Rich wrote:
> Sean Owen <srowen <at> gmail.com> writes:
> 
>>
>> Yes it's item-based only. --similarityClassname chooses the metric but
>> it is item-based.
>>
>> On Tue, Mar 13, 2012 at 11:53 PM, Rich <cchuang411 <at> gmail.com> wrote:
>>> Hi,
>>> I have been digging into Mahout on Hadoop for the pas few days.
>>> I was wondering the recommendation
>>> algorithm that is used in RecommenderJob.java. For example:
>>>
>>> bin/hadoop jar /opt/mahout/core/target/mahout-core-0.7-SNAPSHOT-job.jar
>>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>>> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output
>>> --usersFile input/users.txt --booleanData
>>>
>>> By executing this command, is either item-based or user-based
>>> recommendation algorithm being used?
>>> And does specifying "--similarityClassname" in the command
>>> have anything to do with
>>> choosing item-based or user-based algorithm for the recommendation?
>>>
>>>
>>> The help is appreciated in advance,
>>> Rich
>>>
>>
>>
> 
> 
> Thanks for your reply, Sean.
> 
> Do you have any idea whether user-based recommendation could be implemented for
> Mahout running on Hadoop? Is it possible with the current framework? For
> example, like the chapter 6's sample codes in Mahout in Action, is it possible
> to write custom Mapper/Reducer classes to achieve the goal, providing user-based
> recommendation?
> 
> Thanks,
> Rich 
> 
>

Re: The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Posted by Rich <cc...@gmail.com>.

Sean Owen <srowen <at> gmail.com> writes:

> 
> Yes it's item-based only. --similarityClassname chooses the metric but
> it is item-based.
> 
> On Tue, Mar 13, 2012 at 11:53 PM, Rich <cchuang411 <at> gmail.com> wrote:
> > Hi,
> > I have been digging into Mahout on Hadoop for the pas few days.
> > I was wondering the recommendation
> > algorithm that is used in RecommenderJob.java. For example:
> >
> > bin/hadoop jar /opt/mahout/core/target/mahout-core-0.7-SNAPSHOT-job.jar
> > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> > -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output
> > --usersFile input/users.txt --booleanData
> >
> > By executing this command, is either item-based or user-based
> > recommendation algorithm being used?
> > And does specifying "--similarityClassname" in the command
> > have anything to do with
> > choosing item-based or user-based algorithm for the recommendation?
> >
> >
> > The help is appreciated in advance,
> > Rich
> >
> 
> 


Thanks for your reply, Sean.

Do you have any idea whether user-based recommendation could be implemented for
Mahout running on Hadoop? Is it possible with the current framework? For
example, like the chapter 6's sample codes in Mahout in Action, is it possible
to write custom Mapper/Reducer classes to achieve the goal, providing user-based
recommendation?

Thanks,
Rich

Re: The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Posted by Sean Owen <sr...@gmail.com>.

Yes it's item-based only. --similarityClassname chooses the metric but
it is item-based.

On Tue, Mar 13, 2012 at 11:53 PM, Rich <cc...@gmail.com> wrote:
> Hi,
> I have been digging into Mahout on Hadoop for the pas few days.
> I was wondering the recommendation
> algorithm that is used in RecommenderJob.java. For example:
>
> bin/hadoop jar /opt/mahout/core/target/mahout-core-0.7-SNAPSHOT-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output
> --usersFile input/users.txt --booleanData
>
> By executing this command, is either item-based or user-based
> recommendation algorithm being used?
> And does specifying "--similarityClassname" in the command
> have anything to do with
> choosing item-based or user-based algorithm for the recommendation?
>
>
> The help is appreciated in advance,
> Rich
>