You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by WangRamon <ra...@hotmail.com> on 2011/10/17 06:11:01 UTC

Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?




Hi Guys
 
We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation to evaluate a recommender, so is there a distributed implementation we can use in a Hadoop environment, thanks a lot.
 
BTW, if there is not such an implementation, do we have any solution/idea on how to implement one?
 
Cheers
Ramon

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by WangRamon <ra...@hotmail.com>.

Thanks Sebastian, i will check ParallelFactorizationEvaluator, thanks! CheersRamon> Date: Mon, 17 Oct 2011 09:48:44 +0200
> From: ssc@apache.org
> To: user@mahout.apache.org
> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
> 
> There is already code in Mahout that splits a dataset into training- and
> testset: org.apache.mahout.cf.taste.hadoop.als.eval.DatasetSplitter and
> there is already an evaluator for factorization based recommendations:
> org.apache.mahout.cf.taste.hadoop.als.eval.ParallelFactorizationEvaluator
> 
> This might help as a starting point for implementing evaluation of
> RecommenderJob.
> 
> --sebastian
> 
> 
> 
> 
> On 17.10.2011 09:39, WangRamon wrote:
> > 
> > Hi Sean Do you mean that I should take the concept from the standalone one, keep some real data, let's say 20% of all data, do recommend computation on the other 80%, and finally do a comparation. CheersRamon
> >  > Date: Mon, 17 Oct 2011 08:02:37 +0100
> >> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
> >> From: srowen@gmail.com
> >> To: user@mahout.apache.org
> >>
> >> There is not one, though you could probably adapt the evaluation code
> >> without a great deal of trouble. The concept is the same; the
> >> implementation is quite different. You would withhold some data, and
> >> then compute the value of that withheld data and compare with the
> >> original.
> >>
> >> 2011/10/17 WangRamon <ra...@hotmail.com>:
> >>>
> >>>
> >>>
> >>>
> >>> Hi Guys
> >>>
> >>> We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation to evaluate a recommender, so is there a distributed implementation we can use in a Hadoop environment, thanks a lot.
> >>>
> >>> BTW, if there is not such an implementation, do we have any solution/idea on how to implement one?
> >>>
> >>> Cheers
> >>> Ramon
> >  		 	   		  
>

Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by Sebastian Schelter <ss...@apache.org>.

There is already code in Mahout that splits a dataset into training- and
testset: org.apache.mahout.cf.taste.hadoop.als.eval.DatasetSplitter and
there is already an evaluator for factorization based recommendations:
org.apache.mahout.cf.taste.hadoop.als.eval.ParallelFactorizationEvaluator

This might help as a starting point for implementing evaluation of
RecommenderJob.

--sebastian




On 17.10.2011 09:39, WangRamon wrote:
> 
> Hi Sean Do you mean that I should take the concept from the standalone one, keep some real data, let's say 20% of all data, do recommend computation on the other 80%, and finally do a comparation. CheersRamon
>  > Date: Mon, 17 Oct 2011 08:02:37 +0100
>> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
>> From: srowen@gmail.com
>> To: user@mahout.apache.org
>>
>> There is not one, though you could probably adapt the evaluation code
>> without a great deal of trouble. The concept is the same; the
>> implementation is quite different. You would withhold some data, and
>> then compute the value of that withheld data and compare with the
>> original.
>>
>> 2011/10/17 WangRamon <ra...@hotmail.com>:
>>>
>>>
>>>
>>>
>>> Hi Guys
>>>
>>> We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation to evaluate a recommender, so is there a distributed implementation we can use in a Hadoop environment, thanks a lot.
>>>
>>> BTW, if there is not such an implementation, do we have any solution/idea on how to implement one?
>>>
>>> Cheers
>>> Ramon
>

Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by Sean Owen <sr...@gmail.com>.

That's true, for this particular job you'd have to normalize the
values to make them a valid estimate. I'd forgotten that.

2011/10/17 WangRamon <ra...@hotmail.com>:
>
> Hi Sean
> Actually, i even find something more interesting from book "Mahout in Action" in Chapter 6 Page 72, "Note that the values in R do not represent an estimated preference value -- they’re far too large, for
> one. These could be normalized into estimated preference values with some additional computation, if
> desired. But for purposes here, normalization doesn’t matter, since the ordering of recommendations is
> the important thing, not the exact values on which the ordering depends." The value R is the estimated result computed by running org.apache.mahout.cf.taste.hadoop.item.RecommenderJob, can i use it directly to do the compartion? The book says the R can be normalized into estimated pref, i'm not sure it has been done or not by the RecommenderJob?
> ThanksRamon
>

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by WangRamon <ra...@hotmail.com>.

Hi Sean
Actually, i even find something more interesting from book "Mahout in Action" in Chapter 6 Page 72, "Note that the values in R do not represent an estimated preference value -- they’re far too large, for
one. These could be normalized into estimated preference values with some additional computation, if
desired. But for purposes here, normalization doesn’t matter, since the ordering of recommendations is
the important thing, not the exact values on which the ordering depends." The value R is the estimated result computed by running org.apache.mahout.cf.taste.hadoop.item.RecommenderJob, can i use it directly to do the compartion? The book says the R can be normalized into estimated pref, i'm not sure it has been done or not by the RecommenderJob? 
ThanksRamon

> Date: Mon, 17 Oct 2011 17:26:37 +0100
> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> Yes that's actually probably a easy and quick way to get what you want.
> 
> 2011/10/17 WangRamon <ra...@hotmail.com>:
> >
> > Hi Sean
> > It seems in order to get the estimated pref values for compartion in a distributed environment, i have to complete run org.apache.mahout.cf.taste.hadoop.item.RecommenderJob, meanwhile, set a bigger value to "recommendationsPerUser" to make sure my test data can exist in the estimated top items.  Do i miss something? Any idea? Thanks in advance.
> > CheersRamon
> >

Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by Sean Owen <sr...@gmail.com>.

Yes that's actually probably a easy and quick way to get what you want.

2011/10/17 WangRamon <ra...@hotmail.com>:
>
> Hi Sean
> It seems in order to get the estimated pref values for compartion in a distributed environment, i have to complete run org.apache.mahout.cf.taste.hadoop.item.RecommenderJob, meanwhile, set a bigger value to "recommendationsPerUser" to make sure my test data can exist in the estimated top items.  Do i miss something? Any idea? Thanks in advance.
> CheersRamon
>

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by WangRamon <ra...@hotmail.com>.

Hi Sean
It seems in order to get the estimated pref values for compartion in a distributed environment, i have to complete run org.apache.mahout.cf.taste.hadoop.item.RecommenderJob, meanwhile, set a bigger value to "recommendationsPerUser" to make sure my test data can exist in the estimated top items.  Do i miss something? Any idea? Thanks in advance.
CheersRamon 

> Date: Mon, 17 Oct 2011 09:29:28 +0100
> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> Yes, but those are only estimates for top items. You may not see an
> estimate for the items in your test set.
> 
> Yes also look at Sebastian's suggestion.
> 
> 2011/10/17 WangRamon <ra...@hotmail.com>:
> >
> > Hi Sean Thanks for the quick reply. I'm running org.apache.mahout.cf.taste.hadoop.item.RecommenderJob on Hadoop, actually, i think the result of this job are the estimated prefs for each user, right? So for the evaluation, i plan to keep some real data, just as you said 5% of all, and then run computation on the other 95%, then the recommend result is something like these: User1 like: item2, item3, item4user2 like: item3, item4, item5 But the real data(the 5%) are:User1 like: item3, item4, item1User2 like: item5, item6, item7 Then i do a comparation for these two dataset, am i right? CheersRamon
> >  > Date: Mon, 17 Oct 201

Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by Sean Owen <sr...@gmail.com>.

Yes, but those are only estimates for top items. You may not see an
estimate for the items in your test set.

Yes also look at Sebastian's suggestion.

2011/10/17 WangRamon <ra...@hotmail.com>:
>
> Hi Sean Thanks for the quick reply. I'm running org.apache.mahout.cf.taste.hadoop.item.RecommenderJob on Hadoop, actually, i think the result of this job are the estimated prefs for each user, right? So for the evaluation, i plan to keep some real data, just as you said 5% of all, and then run computation on the other 95%, then the recommend result is something like these: User1 like: item2, item3, item4user2 like: item3, item4, item5 But the real data(the 5%) are:User1 like: item3, item4, item1User2 like: item5, item6, item7 Then i do a comparation for these two dataset, am i right? CheersRamon
>  > Date: Mon, 17 Oct 201

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by WangRamon <ra...@hotmail.com>.

Hi Sean Thanks for the quick reply. I'm running org.apache.mahout.cf.taste.hadoop.item.RecommenderJob on Hadoop, actually, i think the result of this job are the estimated prefs for each user, right? So for the evaluation, i plan to keep some real data, just as you said 5% of all, and then run computation on the other 95%, then the recommend result is something like these: User1 like: item2, item3, item4user2 like: item3, item4, item5 But the real data(the 5%) are:User1 like: item3, item4, item1User2 like: item5, item6, item7 Then i do a comparation for these two dataset, am i right? CheersRamon
 > Date: Mon, 17 Oct 2011 08:49:30 +0100
> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> Yes -- though I might use more like 95% for training.
> You aren't running recommendations, quite; you're computing estimated
> prefs, which is the step before recommendation. I assume you're doing
> a RMSE comparison?
> 
> 2011/10/17 WangRamon <ra...@hotmail.com>:
> >
> > Hi Sean Do you mean that I should take the concept from the standalone one, keep some real data, let's say 20% of all data, do recommend computation on the other 80%, and finally do a comparation. CheersRamon
> >  > Date: Mon, 17 Oct 2011 08:02:37 +0100
> >> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
> >> From: srowen@gmail.com
> >> To: user@mahout.apache.org
> >>
> >> There is not one, though you could probably adapt the evaluation code
> >> without a great deal of trouble. The concept is the same; the
> >> implementation is quite different. You would withhold some data, and
> >> then compute the value of that withheld data and compare with the
> >> original.
> >>
> >> 2011/10/17 WangRamon <ra...@hotmail.com>:
> >> >
> >> >
> >> >
> >> >
> >> > Hi Guys
> >> >
> >> > We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation to evaluate a recommender, so is there a distributed implementation we can use in a Hadoop environment, thanks a lot.
> >> >
> >> > BTW, if there is not such an implementation, do we have any solution/idea on how to implement one?
> >> >
> >> > Cheers
> >> > Ramon
> >

Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by Sean Owen <sr...@gmail.com>.

Yes -- though I might use more like 95% for training.
You aren't running recommendations, quite; you're computing estimated
prefs, which is the step before recommendation. I assume you're doing
a RMSE comparison?

2011/10/17 WangRamon <ra...@hotmail.com>:
>
> Hi Sean Do you mean that I should take the concept from the standalone one, keep some real data, let's say 20% of all data, do recommend computation on the other 80%, and finally do a comparation. CheersRamon
>  > Date: Mon, 17 Oct 2011 08:02:37 +0100
>> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
>> From: srowen@gmail.com
>> To: user@mahout.apache.org
>>
>> There is not one, though you could probably adapt the evaluation code
>> without a great deal of trouble. The concept is the same; the
>> implementation is quite different. You would withhold some data, and
>> then compute the value of that withheld data and compare with the
>> original.
>>
>> 2011/10/17 WangRamon <ra...@hotmail.com>:
>> >
>> >
>> >
>> >
>> > Hi Guys
>> >
>> > We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation to evaluate a recommender, so is there a distributed implementation we can use in a Hadoop environment, thanks a lot.
>> >
>> > BTW, if there is not such an implementation, do we have any solution/idea on how to implement one?
>> >
>> > Cheers
>> > Ramon
>

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by WangRamon <ra...@hotmail.com>.

Hi Sean Do you mean that I should take the concept from the standalone one, keep some real data, let's say 20% of all data, do recommend computation on the other 80%, and finally do a comparation. CheersRamon
 > Date: Mon, 17 Oct 2011 08:02:37 +0100
> Subject: Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> There is not one, though you could probably adapt the evaluation code
> without a great deal of trouble. The concept is the same; the
> implementation is quite different. You would withhold some data, and
> then compute the value of that withheld data and compare with the
> original.
> 
> 2011/10/17 WangRamon <ra...@hotmail.com>:
> >
> >
> >
> >
> > Hi Guys
> >
> > We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation to evaluate a recommender, so is there a distributed implementation we can use in a Hadoop environment, thanks a lot.
> >
> > BTW, if there is not such an implementation, do we have any solution/idea on how to implement one?
> >
> > Cheers
> > Ramon

Re: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

Posted by Sean Owen <sr...@gmail.com>.

There is not one, though you could probably adapt the evaluation code
without a great deal of trouble. The concept is the same; the
implementation is quite different. You would withhold some data, and
then compute the value of that withheld data and compare with the
original.

2011/10/17 WangRamon <ra...@hotmail.com>:
>
>
>
>
> Hi Guys
>
> We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation to evaluate a recommender, so is there a distributed implementation we can use in a Hadoop environment, thanks a lot.
>
> BTW, if there is not such an implementation, do we have any solution/idea on how to implement one?
>
> Cheers
> Ramon