You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Aurora Skarra-Gallagher <au...@yahoo-inc.com> on 2009/07/21 22:20:34 UTC

Running Taste Web example without the webserver

Hi,

I'm trying to run the taste web example without using jetty. Our gateways aren't meant to be used as webservers. By poking around, I found that the following command worked:
hadoop --config ~/hod-clusters/test jar /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner

The output is:
09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GroupLensDataModel
09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7035965559003973
09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7035965559003973

The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm not sure if this is the correct way to run this example. I have a few questions:

 1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
 2.  What does the Evaluation result mean?
 3.  Is it even running on HDFS?
 4.  Is it a map-reduce job?

Any pointers on how to run this as a standalone job would be helpful.

Thanks,
Aurora

Re: Running Taste Web example without the webserver

Posted by Ted Dunning <te...@gmail.com>.

Off-line user based analysis is quite feasible, however.

We worked with data larger than this at Veoh and could crunch the data down
to usable form in 10 hours on a 20 core micro-cluster.

The key step is computing sparse co-occurrences and filtering for
interesting non-zero values.

On Fri, Jul 24, 2009 at 2:11 AM, Sean Owen <sr...@gmail.com> wrote:

> Hundreds of millions of users is big indeed. Sounds like you have way
> more users than items. This tells me that any user-based algorithm is
> probably out of the question. The model certainly can't be loaded into
> memory on one machine. We could work on ways to compute all pairs of
> similarities in a distributed way, but that's trillions of
> similarities, even after filtering out some unnecessary work.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Running Taste Web example without the webserver

Posted by Sean Owen <sr...@gmail.com>.

Ah yeah I thought it might be a spam filter issues.

Yeah, unfortunately the current code uses stuff only in Hadoop 0.20.
You could run Hadoop 0.20 locally, upgrade your cluster (it can run
older-style jobs I believe), or else roll back the code in that
package that touches Hadoop by one revision. The last version would
work on 0.18.3 I believe.

Hundreds of millions of users is big indeed. Sounds like you have way
more users than items. This tells me that any user-based algorithm is
probably out of the question. The model certainly can't be loaded into
memory on one machine. We could work on ways to compute all pairs of
similarities in a distributed way, but that's trillions of
similarities, even after filtering out some unnecessary work.

Item-based recommenders are more realistic. It would still take a long
time to compute item-item similarities given the number of users you
have, but at least you're only computing thousands to millions of such
similarities. Grant is right -- perhaps you can use approaches
unrelated to the preference data to compute item-item similarity.
Given a fixed set of item-item similarities, it is fast to compute
recommendations for any one user. It doesn't require loading the model
into memory. Hence, you could then use the pseudo-distributed Hadoop
framework I've pointed out to spread these computations for each user
across many machines.

For this -- you can test locally for sure. One machine can process
recommendations just fine, given a fixed set of item-item similarities
and an item-based recommender. Heck you don't even need Hadoop to see
how well this works. I would try seeing how well the recommendations
work first, before figuring out Hadoop.

There are also slope-one algorithms. I think they give good results.
They are going to be similar to item-based recommenders in this case.
It requires precomputing a large matrix data structure (and there is a
separate Hadoop job to do that in a distributed way) but it's also
pretty fast at runtime. That is going to require Hadoop to precompute
the data structure at your scale, so, I would try this next after
item-based.

On Fri, Jul 24, 2009 at 12:09 AM, Aurora
Skarra-Gallagher<au...@yahoo-inc.com> wrote:
> Hi,
>
> Thank you for responding. My Spam filter was "out to get me" and your responses were misclassified.
>
> I will investigate the Hadoop integration piece, specifically RecommenderJob. Currently, the Hadoop grid I'm working with is using 0.18.3. Will that pose a problem? I noticed some threads about versions of Hadoop less than 0.19 not working.
>
> We are looking at starting with 70M users and scaling up to 500M eventually. It is hard for me to estimate the number of items. We could be starting out with 100, but as these items are entities that we extract, there could be tens of thousands eventually. I would guess that most users would have less than 100 of these.
>
> Does that help? I would be interested in your input on the algorithms and also being a guinea pig for the code you're developing, if it makes sense.
>
> -Aurora

Re: Running Taste Web example without the webserver

Posted by Ted Dunning <te...@gmail.com>.

My guess is that at these volumes, you may well be wanting to use a more
generic co-occurrence count based solution for the off-line portion.  You
should be good experimenting with Taste since a larger scale analysis can
likely be retrofitted in a compatible way.

On Thu, Jul 23, 2009 at 4:09 PM, Aurora Skarra-Gallagher <
aurora@yahoo-inc.com> wrote:

> We are looking at starting with 70M users and scaling up to 500M
> eventually. It is hard for me to estimate the number of items. We could be
> starting out with 100, but as these items are entities that we extract,
> there could be tens of thousands eventually. I would guess that most users
> would have less than 100 of these.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Running Taste Web example without the webserver

Posted by Aurora Skarra-Gallagher <au...@yahoo-inc.com>.

Hi,

Thank you for responding. My Spam filter was "out to get me" and your responses were misclassified.

I will investigate the Hadoop integration piece, specifically RecommenderJob. Currently, the Hadoop grid I'm working with is using 0.18.3. Will that pose a problem? I noticed some threads about versions of Hadoop less than 0.19 not working.

We are looking at starting with 70M users and scaling up to 500M eventually. It is hard for me to estimate the number of items. We could be starting out with 100, but as these items are entities that we extract, there could be tens of thousands eventually. I would guess that most users would have less than 100 of these.

Does that help? I would be interested in your input on the algorithms and also being a guinea pig for the code you're developing, if it makes sense.

-Aurora


On 7/23/09 12:43 AM, "Sean Owen" <sr...@gmail.com> wrote:

Aurora did you see my last reply on the list?

On Wed, Jul 22, 2009 at 9:29 AM, Sean Owen<sr...@gmail.com> wrote:
> Yes, there are a few components here -- a few different purposes. All
> build around the core library which isn't specific to Hadoop or an
> HTTP server, but you've seen some of the components that adapt the
> core to this contexts. There are also components that can evaluate or
> load test the code.
>
> The only piece you are interested in then is really the Hadoop
> integration -- see org.apache.mahout.cf.taste.hadoop. There you will
> find RecommenderJob which should be able to launch a
> pseudo-distributed recommender job. I say pseudo since these
> algorithms are not in general distributable, but, one can of course
> run n instances of a recommender to compute 1/nth of all
> recommendations each. That is nice, though it means, say, the amount
> of RAM the jobs consume is still limited by the size of each machine.
>
> I just recently rewrote this package to be compatible with Hadoop
> 0.20's new APIs. I do not know that it works, and, have some reason to
> believe there are bugs in the API that will prevent it from working.
> So this piece is currently in flux.
>
> If you want to experiment and be a guinea pig for this latest
> revision, I can provide close support to work through the bugs on both
> sides. Or we can talk about your requirements more a bit to figure out
> whether this is feasible, what the best algorithm is, whether you need
> Hadoop?
>
> How big is 'massive'? could you reveal how many users, items, and
> user-item preferences to an order of magnitude? what is generally the
> nature of the input data you have, and you want recommendations out?
>
> On Wed, Jul 22, 2009 at 12:12 AM, Aurora
> Skarra-Gallagher<au...@yahoo-inc.com> wrote:
>> Hi,
>>
>> I apologize if I've misunderstood the purpose of the Taste component of Mahout. Our goal was to take a recommendation framework and use our own recommendation algorithm within it. We need to process a massive amount of data, and wanted it to be done on our Hadoop grid. I thought that Taste was the right fit for the job. I'm not interested in the HTTP service. I'm interested in the recommendation framework, particularly from a back-end batch perspective. Does that help clarify? Thanks for helping me sort through this.
>>
>> -Aurora
>>
>>
>> On 7/21/09 3:02 PM, "Sean Owen" <sr...@gmail.com> wrote:
>>
>> Hmm, lots going on here, it's confusing.
>>
>> Are you trying to run this on Hadoop intentionally? because the web
>> app example is not intended to run on Hadoop. It's a component
>> intended to serve recommendations over HTTP in real time. It also
>> appears you are running an evaluation rather than a web app serving
>> requests. I realize you're trying to run this without Jetty, but
>> that's kind of like trying to run a web app without a web server.
>>
>> I think you'd have to clarify what you are trying to do, and then what
>> you are doing right now, to begin to assist.
>>
>> On Tue, Jul 21, 2009 at 9:20 PM, Aurora
>> Skarra-Gallagher<au...@yahoo-inc.com> wrote:
>>> Hi,
>>>
>>> I'm trying to run the taste web example without using jetty. Our gateways aren't meant to be used as webservers. By poking around, I found that the following command worked:
>>> hadoop --config ~/hod-clusters/test jar /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner
>>>
>>> The output is:
>>> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
>>> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GroupLensDataModel
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
>>> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
>>> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7035965559003973
>>> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7035965559003973
>>>
>>> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm not sure if this is the correct way to run this example. I have a few questions:
>>>
>>>  1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
>>>  2.  What does the Evaluation result mean?
>>>  3.  Is it even running on HDFS?
>>>  4.  Is it a map-reduce job?
>>>
>>> Any pointers on how to run this as a standalone job would be helpful.
>>>
>>> Thanks,
>>> Aurora
>>>
>>
>>
>

Re: Running Taste Web example without the webserver

Posted by Sean Owen <sr...@gmail.com>.

Aurora did you see my last reply on the list?

On Wed, Jul 22, 2009 at 9:29 AM, Sean Owen<sr...@gmail.com> wrote:
> Yes, there are a few components here -- a few different purposes. All
> build around the core library which isn't specific to Hadoop or an
> HTTP server, but you've seen some of the components that adapt the
> core to this contexts. There are also components that can evaluate or
> load test the code.
>
> The only piece you are interested in then is really the Hadoop
> integration -- see org.apache.mahout.cf.taste.hadoop. There you will
> find RecommenderJob which should be able to launch a
> pseudo-distributed recommender job. I say pseudo since these
> algorithms are not in general distributable, but, one can of course
> run n instances of a recommender to compute 1/nth of all
> recommendations each. That is nice, though it means, say, the amount
> of RAM the jobs consume is still limited by the size of each machine.
>
> I just recently rewrote this package to be compatible with Hadoop
> 0.20's new APIs. I do not know that it works, and, have some reason to
> believe there are bugs in the API that will prevent it from working.
> So this piece is currently in flux.
>
> If you want to experiment and be a guinea pig for this latest
> revision, I can provide close support to work through the bugs on both
> sides. Or we can talk about your requirements more a bit to figure out
> whether this is feasible, what the best algorithm is, whether you need
> Hadoop?
>
> How big is 'massive'? could you reveal how many users, items, and
> user-item preferences to an order of magnitude? what is generally the
> nature of the input data you have, and you want recommendations out?
>
> On Wed, Jul 22, 2009 at 12:12 AM, Aurora
> Skarra-Gallagher<au...@yahoo-inc.com> wrote:
>> Hi,
>>
>> I apologize if I've misunderstood the purpose of the Taste component of Mahout. Our goal was to take a recommendation framework and use our own recommendation algorithm within it. We need to process a massive amount of data, and wanted it to be done on our Hadoop grid. I thought that Taste was the right fit for the job. I'm not interested in the HTTP service. I'm interested in the recommendation framework, particularly from a back-end batch perspective. Does that help clarify? Thanks for helping me sort through this.
>>
>> -Aurora
>>
>>
>> On 7/21/09 3:02 PM, "Sean Owen" <sr...@gmail.com> wrote:
>>
>> Hmm, lots going on here, it's confusing.
>>
>> Are you trying to run this on Hadoop intentionally? because the web
>> app example is not intended to run on Hadoop. It's a component
>> intended to serve recommendations over HTTP in real time. It also
>> appears you are running an evaluation rather than a web app serving
>> requests. I realize you're trying to run this without Jetty, but
>> that's kind of like trying to run a web app without a web server.
>>
>> I think you'd have to clarify what you are trying to do, and then what
>> you are doing right now, to begin to assist.
>>
>> On Tue, Jul 21, 2009 at 9:20 PM, Aurora
>> Skarra-Gallagher<au...@yahoo-inc.com> wrote:
>>> Hi,
>>>
>>> I'm trying to run the taste web example without using jetty. Our gateways aren't meant to be used as webservers. By poking around, I found that the following command worked:
>>> hadoop --config ~/hod-clusters/test jar /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner
>>>
>>> The output is:
>>> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
>>> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GroupLensDataModel
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
>>> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
>>> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7035965559003973
>>> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7035965559003973
>>>
>>> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm not sure if this is the correct way to run this example. I have a few questions:
>>>
>>>  1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
>>>  2.  What does the Evaluation result mean?
>>>  3.  Is it even running on HDFS?
>>>  4.  Is it a map-reduce job?
>>>
>>> Any pointers on how to run this as a standalone job would be helpful.
>>>
>>> Thanks,
>>> Aurora
>>>
>>
>>
>

Re: Running Taste Web example without the webserver

Posted by Sean Owen <sr...@gmail.com>.

Yes, there are a few components here -- a few different purposes. All
build around the core library which isn't specific to Hadoop or an
HTTP server, but you've seen some of the components that adapt the
core to this contexts. There are also components that can evaluate or
load test the code.

The only piece you are interested in then is really the Hadoop
integration -- see org.apache.mahout.cf.taste.hadoop. There you will
find RecommenderJob which should be able to launch a
pseudo-distributed recommender job. I say pseudo since these
algorithms are not in general distributable, but, one can of course
run n instances of a recommender to compute 1/nth of all
recommendations each. That is nice, though it means, say, the amount
of RAM the jobs consume is still limited by the size of each machine.

I just recently rewrote this package to be compatible with Hadoop
0.20's new APIs. I do not know that it works, and, have some reason to
believe there are bugs in the API that will prevent it from working.
So this piece is currently in flux.

If you want to experiment and be a guinea pig for this latest
revision, I can provide close support to work through the bugs on both
sides. Or we can talk about your requirements more a bit to figure out
whether this is feasible, what the best algorithm is, whether you need
Hadoop?

How big is 'massive'? could you reveal how many users, items, and
user-item preferences to an order of magnitude? what is generally the
nature of the input data you have, and you want recommendations out?

On Wed, Jul 22, 2009 at 12:12 AM, Aurora
Skarra-Gallagher<au...@yahoo-inc.com> wrote:
> Hi,
>
> I apologize if I've misunderstood the purpose of the Taste component of Mahout. Our goal was to take a recommendation framework and use our own recommendation algorithm within it. We need to process a massive amount of data, and wanted it to be done on our Hadoop grid. I thought that Taste was the right fit for the job. I'm not interested in the HTTP service. I'm interested in the recommendation framework, particularly from a back-end batch perspective. Does that help clarify? Thanks for helping me sort through this.
>
> -Aurora
>
>
> On 7/21/09 3:02 PM, "Sean Owen" <sr...@gmail.com> wrote:
>
> Hmm, lots going on here, it's confusing.
>
> Are you trying to run this on Hadoop intentionally? because the web
> app example is not intended to run on Hadoop. It's a component
> intended to serve recommendations over HTTP in real time. It also
> appears you are running an evaluation rather than a web app serving
> requests. I realize you're trying to run this without Jetty, but
> that's kind of like trying to run a web app without a web server.
>
> I think you'd have to clarify what you are trying to do, and then what
> you are doing right now, to begin to assist.
>
> On Tue, Jul 21, 2009 at 9:20 PM, Aurora
> Skarra-Gallagher<au...@yahoo-inc.com> wrote:
>> Hi,
>>
>> I'm trying to run the taste web example without using jetty. Our gateways aren't meant to be used as webservers. By poking around, I found that the following command worked:
>> hadoop --config ~/hod-clusters/test jar /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner
>>
>> The output is:
>> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
>> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GroupLensDataModel
>> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
>> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
>> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
>> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7035965559003973
>> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7035965559003973
>>
>> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm not sure if this is the correct way to run this example. I have a few questions:
>>
>>  1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
>>  2.  What does the Evaluation result mean?
>>  3.  Is it even running on HDFS?
>>  4.  Is it a map-reduce job?
>>
>> Any pointers on how to run this as a standalone job would be helpful.
>>
>> Thanks,
>> Aurora
>>
>
>

Re: Running Taste Web example without the webserver

Posted by Aurora Skarra-Gallagher <au...@yahoo-inc.com>.

Hi,

Is anyone able to point me in the right direction on this?

Thanks,
Aurora

On 7/21/09 4:12 PM, "Aurora Skarra-gallagher" <au...@yahoo-inc.com> wrote:

Hi,

I apologize if I've misunderstood the purpose of the Taste component of Mahout. Our goal was to take a recommendation framework and use our own recommendation algorithm within it. We need to process a massive amount of data, and wanted it to be done on our Hadoop grid. I thought that Taste was the right fit for the job. I'm not interested in the HTTP service. I'm interested in the recommendation framework, particularly from a back-end batch perspective. Does that help clarify? Thanks for helping me sort through this.

-Aurora

On 7/21/09 3:02 PM, "Sean Owen" <sr...@gmail.com> wrote:

Hmm, lots going on here, it's confusing.

Are you trying to run this on Hadoop intentionally? because the web
app example is not intended to run on Hadoop. It's a component
intended to serve recommendations over HTTP in real time. It also
appears you are running an evaluation rather than a web app serving
requests. I realize you're trying to run this without Jetty, but
that's kind of like trying to run a web app without a web server.

I think you'd have to clarify what you are trying to do, and then what
you are doing right now, to begin to assist.

On Tue, Jul 21, 2009 at 9:20 PM, Aurora
Skarra-Gallagher<au...@yahoo-inc.com> wrote:
> Hi,
>
> I'm trying to run the taste web example without using jetty. Our gateways aren't meant to be used as webservers. By poking around, I found that the following command worked:
> hadoop --config ~/hod-clusters/test jar /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner
>
> The output is:
> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GroupLensDataModel
> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7035965559003973
> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7035965559003973
>
> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm not sure if this is the correct way to run this example. I have a few questions:
>
>  1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
>  2.  What does the Evaluation result mean?
>  3.  Is it even running on HDFS?
>  4.  Is it a map-reduce job?
>
> Any pointers on how to run this as a standalone job would be helpful.
>
> Thanks,
> Aurora
>

Re: Running Taste Web example without the webserver

Posted by Aurora Skarra-Gallagher <au...@yahoo-inc.com>.

Hi,

I apologize if I've misunderstood the purpose of the Taste component of Mahout. Our goal was to take a recommendation framework and use our own recommendation algorithm within it. We need to process a massive amount of data, and wanted it to be done on our Hadoop grid. I thought that Taste was the right fit for the job. I'm not interested in the HTTP service. I'm interested in the recommendation framework, particularly from a back-end batch perspective. Does that help clarify? Thanks for helping me sort through this.

-Aurora

On 7/21/09 3:02 PM, "Sean Owen" <sr...@gmail.com> wrote:

Hmm, lots going on here, it's confusing.

Are you trying to run this on Hadoop intentionally? because the web
app example is not intended to run on Hadoop. It's a component
intended to serve recommendations over HTTP in real time. It also
appears you are running an evaluation rather than a web app serving
requests. I realize you're trying to run this without Jetty, but
that's kind of like trying to run a web app without a web server.

I think you'd have to clarify what you are trying to do, and then what
you are doing right now, to begin to assist.

On Tue, Jul 21, 2009 at 9:20 PM, Aurora
Skarra-Gallagher<au...@yahoo-inc.com> wrote:
> Hi,
>
> I'm trying to run the taste web example without using jetty. Our gateways aren't meant to be used as webservers. By poking around, I found that the following command worked:
> hadoop --config ~/hod-clusters/test jar /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner
>
> The output is:
> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GroupLensDataModel
> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7035965559003973
> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7035965559003973
>
> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm not sure if this is the correct way to run this example. I have a few questions:
>
>  1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
>  2.  What does the Evaluation result mean?
>  3.  Is it even running on HDFS?
>  4.  Is it a map-reduce job?
>
> Any pointers on how to run this as a standalone job would be helpful.
>
> Thanks,
> Aurora
>

Re: Running Taste Web example without the webserver

Posted by Sean Owen <sr...@gmail.com>.

Hmm, lots going on here, it's confusing.

Are you trying to run this on Hadoop intentionally? because the web
app example is not intended to run on Hadoop. It's a component
intended to serve recommendations over HTTP in real time. It also
appears you are running an evaluation rather than a web app serving
requests. I realize you're trying to run this without Jetty, but
that's kind of like trying to run a web app without a web server.

I think you'd have to clarify what you are trying to do, and then what
you are doing right now, to begin to assist.

On Tue, Jul 21, 2009 at 9:20 PM, Aurora
Skarra-Gallagher<au...@yahoo-inc.com> wrote:
> Hi,
>
> I'm trying to run the taste web example without using jetty. Our gateways aren't meant to be used as webservers. By poking around, I found that the following command worked:
> hadoop --config ~/hod-clusters/test jar /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner
>
> The output is:
> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GroupLensDataModel
> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7035965559003973
> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7035965559003973
>
> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm not sure if this is the correct way to run this example. I have a few questions:
>
>  1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
>  2.  What does the Evaluation result mean?
>  3.  Is it even running on HDFS?
>  4.  Is it a map-reduce job?
>
> Any pointers on how to run this as a standalone job would be helpful.
>
> Thanks,
> Aurora
>