You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Alessandro Binhara <bi...@gmail.com> on 2010/12/30 14:45:38 UTC

mahout on hadoop question ?

Hello everyone

I am studying the RecommenderJob to run a recommendation system in hadoop.
I have currently DataModule are loaded as a singleton, and it are  cached in
memory. I have a servlet responds to requests sent to the mahout.

Using this RecommenderJob on hadoop. The RecommenderJob will  every time
load a datamodel from  HDFS  files and then processing the recommendation?

It is possible to use some strategy to get this cache in the cluster?

The response of the recommendation will be written in HDFS, how do I
identify the answer? Is there any job ID in hadoop?

thank´s

Re: mahout on hadoop question ?

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Alessandro,

I'm not quite sure I understand what you are trying to accomplish.

Usually you would use RecommenderJob to precompute recommendations and
then feed them in any way (database, Solr-server) into your livesystem.

You can also just use ItemSimilarityJob to precompute the item-item
similarities, copy the resulting files to your live system, load them
via FileItemSimilarity and have Taste compute the recommendations online
after that.

Another option is to completely neglect the use of hadoop and have Taste
compute everything in realtime. I suggest you try this first and see if
it fits your usecase as it's the most simple and convenient path to take.

--sebastian


Am 30.12.2010 14:45, schrieb Alessandro Binhara:
> Hello everyone
> 
> I am studying the RecommenderJob to run a recommendation system in hadoop.
> I have currently DataModule are loaded as a singleton, and it are  cached in
> memory. I have a servlet responds to requests sent to the mahout.
> 
> Using this RecommenderJob on hadoop. The RecommenderJob will  every time
> load a datamodel from  HDFS  files and then processing the recommendation?
> 
> It is possible to use some strategy to get this cache in the cluster?
> 
> The response of the recommendation will be written in HDFS, how do I
> identify the answer? Is there any job ID in hadoop?
> 
> thank´s
>

Re: mahout on hadoop question ?

Posted by Sean Owen <sr...@gmail.com>.

Just don't use Hadoop. Most of the recommender code in here is not
Hadoop-based, and is for more real-time operation (though at the cost
of not being able to scale past some large size). Check out the Mahout
wiki for an introduction to building a recommender like this:
https://cwiki.apache.org/MAHOUT/recommender-documentation.html

On Thu, Dec 30, 2010 at 11:12 AM, Alessandro Binhara <bi...@gmail.com> wrote:
> ok...
>
> On Thu, Dec 30, 2010 at 12:45 PM, Sean Owen <sr...@gmail.com> wrote:
>
>
>
>> Can you cache a DataModel in memory across workers in a cluster? No --
>> the workers are perhaps not on the same machine, or even in the same
>> datacenter. Each worker would have to load its own.
>>
>> Yes, i  undestand it...
>
>
>> But it sounds a bit like you are trying to have a servlet make
>> recommendations in real-time by calling out to Hadoop.
>
>
> That´s .. it...
> I m looking for how to create recommendation in real-time.
>
> This will never work. Hadoop is a big batch-oriented framework.
>>
>> was understood that this operation on hadoop, like a bathc-oriented.
>
>
>
>> What you can do is pre-compute recommendations with Hadoop, as you are
>> doing, and write to HDFS. Then the servlet can load recs from HDFS,
>> yes. No problem there.
>>
>>
> We have a recommendation system running on mahout here.
> We thought we could build with the hadoop a realtime recommendation system with
> mahout.
> I see many problems:
> - how to update the data model dynamically the mahout.?
> - hadoop  was not build to make real-time processing. What could be used to
> create a recommendations distributed system ?
>
> thanks for help !!!!
>

Re: mahout on hadoop question ?

Posted by Alessandro Binhara <bi...@gmail.com>.

ok...

On Thu, Dec 30, 2010 at 12:45 PM, Sean Owen <sr...@gmail.com> wrote:



> Can you cache a DataModel in memory across workers in a cluster? No --
> the workers are perhaps not on the same machine, or even in the same
> datacenter. Each worker would have to load its own.
>
> Yes, i  undestand it...


> But it sounds a bit like you are trying to have a servlet make
> recommendations in real-time by calling out to Hadoop.


That´s .. it...
I m looking for how to create recommendation in real-time.

This will never work. Hadoop is a big batch-oriented framework.
>
> was understood that this operation on hadoop, like a bathc-oriented.



> What you can do is pre-compute recommendations with Hadoop, as you are
> doing, and write to HDFS. Then the servlet can load recs from HDFS,
> yes. No problem there.
>
>
We have a recommendation system running on mahout here.
We thought we could build with the hadoop a realtime recommendation system with
mahout.
I see many problems:
- how to update the data model dynamically the mahout.?
- hadoop  was not build to make real-time processing. What could be used to
create a recommendations distributed system ?

thanks for help !!!!

Re: mahout on hadoop question ?

Posted by Sean Owen <sr...@gmail.com>.

Are you using the "pseudo-distributed" RecommenderJob? There are a few
RecommenderJobs!

Can you cache a DataModel in memory across workers in a cluster? No --
the workers are perhaps not on the same machine, or even in the same
datacenter. Each worker would have to load its own.

But it sounds a bit like you are trying to have a servlet make
recommendations in real-time by calling out to Hadoop. This will never
work. Hadoop is a big batch-oriented framework.

What you can do is pre-compute recommendations with Hadoop, as you are
doing, and write to HDFS. Then the servlet can load recs from HDFS,
yes. No problem there.

On Thu, Dec 30, 2010 at 7:45 AM, Alessandro Binhara <bi...@gmail.com> wrote:
> Hello everyone
>
> I am studying the RecommenderJob to run a recommendation system in hadoop.
> I have currently DataModule are loaded as a singleton, and it are  cached in
> memory. I have a servlet responds to requests sent to the mahout.
>
> Using this RecommenderJob on hadoop. The RecommenderJob will  every time
> load a datamodel from  HDFS  files and then processing the recommendation?
>
> It is possible to use some strategy to get this cache in the cluster?
>
> The response of the recommendation will be written in HDFS, how do I
> identify the answer? Is there any job ID in hadoop?
>
> thank´s
>