You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by xenlee - Zerg <sc...@gmail.com> on 2014/08/06 15:34:13 UTC

UserBasedRecommender question

Hi,
I am building an Item Based Recommender System for 10 million users who
rate categories over 20 possible categories (new categories like politic,
sport etc...)
I would like for each one of them to be recommended at least another
category which they don't know (no rating).

I runned a GenericUserBasedRecommender and asked for recommendations for
each user but It looks extremely long: maybe 1000 user proceeded per minute.
My questions are:

Can I run this same GenericUserBasedRecommender on hadoop and would it
really befaster? I saw and run an ItemBasedRecommender with command line on
a cluster, but I would prefer run a User Based one.

Is there another smarter way to deal with my problem? Maybe some clustering
solution instead of recommendation? I don't exactly see how.

Finally, am I right when I say that the algorithms who have no command line
are not to use with hadoop?


Thank you for your answers,

xenlee -

Re: UserBasedRecommender question

Posted by Pat Ferrel <pa...@gmail.com>.

BTW I ran across this page on the Mahout wiki that explains what runs on single machines, mapreduce, and spark.
http://mahout.apache.org/users/basics/algorithms.html

On Aug 6, 2014, at 1:31 PM, Pat Ferrel <pa...@gmail.com> wrote:

Most people use Mahout as a Library so they write Java to use the internals Mahout provides. The exceptions are the prepackaged algorithms or tools that have a command line interface (CLI) but even these can be called from your own code too.

Of the things accessible from the CLI there is no rule. However all of the recommenders that have a CLI are Hadoop based. They take input and pre-calculate all recommendations for all users.

Mahout is a “scalable” machine learning library so most of what it provides is meant to run on hadoop or more recently Spark.

On Aug 6, 2014, at 12:46 PM, Francois Bossiere <fr...@gmail.com> wrote:

Thank you for your reply, i will think about it !

For my question with the command line, It is just that I don't really understand which algorithms can be used on a hadoop cluster, and which can not. And for those which can, how can I call them if not using the command line like "mahout recommendItemBased --input ... -output ... -s PearsonCorrelationSimilarity".
Le 06/08/2014 20:16, Ted Dunning a écrit :

> If you only have 20 categories, I would recommend that you consider using
> different technologies than recommendations.  Simply building 20
> classifiers is likely to be as effective or more so.
> 
> I don't understand your question about the command line.
> 
> 
> 
> On Wed, Aug 6, 2014 at 7:34 AM, xenlee - Zerg <sc...@gmail.com> wrote:
> 
>> Hi,
>> I am building an Item Based Recommender System for 10 million users who
>> rate categories over 20 possible categories (new categories like politic,
>> sport etc...)
>> I would like for each one of them to be recommended at least another
>> category which they don't know (no rating).
>> 
>> I runned a GenericUserBasedRecommender and asked for recommendations for
>> each user but It looks extremely long: maybe 1000 user proceeded per
>> minute.
>> My questions are:
>> 
>> Can I run this same GenericUserBasedRecommender on hadoop and would it
>> really befaster? I saw and run an ItemBasedRecommender with command line on
>> a cluster, but I would prefer run a User Based one.
>> 
>> Is there another smarter way to deal with my problem? Maybe some clustering
>> solution instead of recommendation? I don't exactly see how.
>> 
>> Finally, am I right when I say that the algorithms who have no command line
>> are not to use with hadoop?
>> 
>> 
>> Thank you for your answers,
>> 
>> xenlee -
>>

Re: UserBasedRecommender question

Posted by Pat Ferrel <pa...@gmail.com>.

Most people use Mahout as a Library so they write Java to use the internals Mahout provides. The exceptions are the prepackaged algorithms or tools that have a command line interface (CLI) but even these can be called from your own code too.

Of the things accessible from the CLI there is no rule. However all of the recommenders that have a CLI are Hadoop based. They take input and pre-calculate all recommendations for all users.

Mahout is a “scalable” machine learning library so most of what it provides is meant to run on hadoop or more recently Spark.

On Aug 6, 2014, at 12:46 PM, Francois Bossiere <fr...@gmail.com> wrote:

Thank you for your reply, i will think about it !

For my question with the command line, It is just that I don't really understand which algorithms can be used on a hadoop cluster, and which can not. And for those which can, how can I call them if not using the command line like "mahout recommendItemBased --input ... -output ... -s PearsonCorrelationSimilarity".
Le 06/08/2014 20:16, Ted Dunning a écrit :

> If you only have 20 categories, I would recommend that you consider using
> different technologies than recommendations.  Simply building 20
> classifiers is likely to be as effective or more so.
> 
> I don't understand your question about the command line.
> 
> 
> 
> On Wed, Aug 6, 2014 at 7:34 AM, xenlee - Zerg <sc...@gmail.com> wrote:
> 
>> Hi,
>> I am building an Item Based Recommender System for 10 million users who
>> rate categories over 20 possible categories (new categories like politic,
>> sport etc...)
>> I would like for each one of them to be recommended at least another
>> category which they don't know (no rating).
>> 
>> I runned a GenericUserBasedRecommender and asked for recommendations for
>> each user but It looks extremely long: maybe 1000 user proceeded per
>> minute.
>> My questions are:
>> 
>> Can I run this same GenericUserBasedRecommender on hadoop and would it
>> really befaster? I saw and run an ItemBasedRecommender with command line on
>> a cluster, but I would prefer run a User Based one.
>> 
>> Is there another smarter way to deal with my problem? Maybe some clustering
>> solution instead of recommendation? I don't exactly see how.
>> 
>> Finally, am I right when I say that the algorithms who have no command line
>> are not to use with hadoop?
>> 
>> 
>> Thank you for your answers,
>> 
>> xenlee -
>>

Re: UserBasedRecommender question

Posted by Francois Bossiere <fr...@gmail.com>.

Thank you for your reply, i will think about it !

For my question with the command line, It is just that I don't really 
understand which algorithms can be used on a hadoop cluster, and which 
can not. And for those which can, how can I call them if not using the 
command line like "mahout recommendItemBased --input ... -output ... -s 
PearsonCorrelationSimilarity".
Le 06/08/2014 20:16, Ted Dunning a écrit :


> If you only have 20 categories, I would recommend that you consider using
> different technologies than recommendations.  Simply building 20
> classifiers is likely to be as effective or more so.
>
> I don't understand your question about the command line.
>
>
>
> On Wed, Aug 6, 2014 at 7:34 AM, xenlee - Zerg <sc...@gmail.com> wrote:
>
>> Hi,
>> I am building an Item Based Recommender System for 10 million users who
>> rate categories over 20 possible categories (new categories like politic,
>> sport etc...)
>> I would like for each one of them to be recommended at least another
>> category which they don't know (no rating).
>>
>> I runned a GenericUserBasedRecommender and asked for recommendations for
>> each user but It looks extremely long: maybe 1000 user proceeded per
>> minute.
>> My questions are:
>>
>> Can I run this same GenericUserBasedRecommender on hadoop and would it
>> really befaster? I saw and run an ItemBasedRecommender with command line on
>> a cluster, but I would prefer run a User Based one.
>>
>> Is there another smarter way to deal with my problem? Maybe some clustering
>> solution instead of recommendation? I don't exactly see how.
>>
>> Finally, am I right when I say that the algorithms who have no command line
>> are not to use with hadoop?
>>
>>
>> Thank you for your answers,
>>
>> xenlee -
>>

Re: UserBasedRecommender question

Posted by Ted Dunning <te...@gmail.com>.

If you only have 20 categories, I would recommend that you consider using
different technologies than recommendations.  Simply building 20
classifiers is likely to be as effective or more so.

I don't understand your question about the command line.



On Wed, Aug 6, 2014 at 7:34 AM, xenlee - Zerg <sc...@gmail.com> wrote:

> Hi,
> I am building an Item Based Recommender System for 10 million users who
> rate categories over 20 possible categories (new categories like politic,
> sport etc...)
> I would like for each one of them to be recommended at least another
> category which they don't know (no rating).
>
> I runned a GenericUserBasedRecommender and asked for recommendations for
> each user but It looks extremely long: maybe 1000 user proceeded per
> minute.
> My questions are:
>
> Can I run this same GenericUserBasedRecommender on hadoop and would it
> really befaster? I saw and run an ItemBasedRecommender with command line on
> a cluster, but I would prefer run a User Based one.
>
> Is there another smarter way to deal with my problem? Maybe some clustering
> solution instead of recommendation? I don't exactly see how.
>
> Finally, am I right when I say that the algorithms who have no command line
> are not to use with hadoop?
>
>
> Thank you for your answers,
>
> xenlee -
>