You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by ziad kamel <zi...@gmail.com> on 2012/08/09 17:12:29 UTC

How good recommendations and precision works

Hi , I asked this question few months ago with no answer. Hopefully
someone can help .

When not using a threshold, the default is to use average ratings plus
one standard deviation which equals to 16%. Assume that a user have
100 items. Does that mean that his good recommendations are the top 16
items ? In case we use precision at 5 , we going to select  only top 5
items from the 100.  So is the precison going to be how many among the
16 items are in the 5 items ? Assume that we get 4 from the 16 in list
of 5 , the precision will be 80% ?

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null,
model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);

Thanks !

Re: How good recommendations and precision works

Posted by Sean Owen <sr...@gmail.com>.

The relevant items, the top 16, are a set. You find how many of the
recommendations fall in that set. For precision, ordering does not matter.

You are right that the metric kind of falls apart for users with very few
data points. You want to use precision at a small number, and perhaps
ignore the results on users with little data.

On Thu, Aug 9, 2012 at 5:20 PM, ziad kamel <zi...@gmail.com> wrote:

> Thanks Sean !
>
> Please correct me , when selecting the 16% items we use the top items
> , but when comparing with the recommended items we don't use sorted
> list . In other words we just compare 2 lists?
>
> How mahout deal with these 2 cases?
>
> Case 1: user have many items. Assume 1000 item , so if we recommend 5
> good items from the 160 items we will get a precision of 100% ? is
> that ok ?
>
> Case 2: user having less than 7 items. Assume 5 items, in this case
> there won't be top items in the list so the user won't get any
> recommendation and no precision ? Do we need to select another
> threshold like 50% ?
>
>

Re: How good recommendations and precision works

Posted by Ted Dunning <te...@gmail.com>.

Recommenders and classifiers are very similar animals in general
except for the training data.

You can view a recommender as an engine that invents a classifier for
each user but it does this by using other user histories as training
data.

This means that there can be a lot of confusion when looking at either
kind of beast at a micro level.

On Thu, Aug 9, 2012 at 1:20 PM, ziad kamel <zi...@gmail.com> wrote:
> Thanks again.
>
> A quick question , in recommendation , if we measure precision @ 1 ,
> how is that different from measuring precision in a classifier ?  Does
> that mean a recommender becomes a classifier at this case ?
>
>
>
>
> On Thu, Aug 9, 2012 at 12:18 PM, Sean Owen <sr...@gmail.com> wrote:
>> Yes, this is a definite weakness of the precision test as applied to
>> recommenders. It is somewhat flawed; it is easy to apply and has some use.
>>
>> Any item the user has interacted with is significant. The less-preferred 84
>> still probably predict the most-preferred 16 to some extent. But you make a
>> good point, the bottom of the list is of a different nature than the top,
>> and that bias does harm the recommendations, making the test result less
>> useful.
>>
>> This is not a big issue though if the precision@ number is quite small
>> compared to the user pref list size.
>>
>> There's a stronger problem, that the user's pref list is not complete. A
>> recommendation that's not in the list already may still be a good
>> recommendation, in the abstract. But a precision test would count it as
>> "wrong".
>>
>> nDCG is slightly better than precision but still has this fundamental
>> problem.
>>
>> The "real" test is to make recommendations and then put them in front of
>> users somehow and see how many are clicked or acted on. That's the best
>> test but fairly impractical in most cases.
>>
>> On Thu, Aug 9, 2012 at 5:54 PM, ziad kamel <zi...@gmail.com> wrote:
>>
>>> I see, but we are removing the good recommendations and we are
>>> assuming that the less preferred items by a user can predict his best
>>> preferred. For example, a user that has 100 books , and preferred 16
>>> of them only while the rest are books he have read. By removing the 16
>>> we are left with 84 books that it seems won't be able to predict the
>>> right set of 16 ?
>>>
>>> What are the recommended approaches to evaluate the results ? I assume
>>> IR approach is one of them.
>>>
>>> Highly appreciating your help Sean .
>>>
>>> On Thu, Aug 9, 2012 at 11:45 AM, Sean Owen <sr...@gmail.com> wrote:
>>> > Yes, or else those items would not be eligible for recommendation. And it
>>> > would be like giving students the answers to a test before the test.
>>> >
>>> > On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <zi...@gmail.com>
>>> wrote:
>>> >
>>> >> A related question please.
>>> >>
>>> >> Do Mahout remove the 16% good items before recommending and use the
>>> >> 84% to predict the 16% ?
>>> >>
>>> >>
>>>

Re: How good recommendations and precision works

Posted by Lance Norskog <go...@gmail.com>.

MRR (Mean Reciprocal Rank) is a more realistic version of the same
thing: first on the list counts as one, second on the list counts as
1/2, third on the list counts as 1/3 down to 5. This tries to match
the probability of people clicking listings in the first page.

On Thu, Aug 9, 2012 at 1:37 PM, Sean Owen <sr...@gmail.com> wrote:
> Evaluating precision @ 1 is evaluating the 1st recommendation, whether it's
> a good recommendation. It's like asking for the data point that a
> classifier would classify as most probably in a certain class. That's not
> the same as what a classifier is built to do, which is to decide whether
> any given item is in a class or not. Those are obviously quite related
> questions though.
>
> On Thu, Aug 9, 2012 at 9:20 PM, ziad kamel <zi...@gmail.com> wrote:
>
>> Thanks again.
>>
>> A quick question , in recommendation , if we measure precision @ 1 ,
>> how is that different from measuring precision in a classifier ?  Does
>> that mean a recommender becomes a classifier at this case ?
>>
>>
>>
>>
>> On Thu, Aug 9, 2012 at 12:18 PM, Sean Owen <sr...@gmail.com> wrote:
>> > Yes, this is a definite weakness of the precision test as applied to
>> > recommenders. It is somewhat flawed; it is easy to apply and has some
>> use.
>> >
>> > Any item the user has interacted with is significant. The less-preferred
>> 84
>> > still probably predict the most-preferred 16 to some extent. But you
>> make a
>> > good point, the bottom of the list is of a different nature than the top,
>> > and that bias does harm the recommendations, making the test result less
>> > useful.
>> >
>> > This is not a big issue though if the precision@ number is quite small
>> > compared to the user pref list size.
>> >
>> > There's a stronger problem, that the user's pref list is not complete. A
>> > recommendation that's not in the list already may still be a good
>> > recommendation, in the abstract. But a precision test would count it as
>> > "wrong".
>> >
>> > nDCG is slightly better than precision but still has this fundamental
>> > problem.
>> >
>> > The "real" test is to make recommendations and then put them in front of
>> > users somehow and see how many are clicked or acted on. That's the best
>> > test but fairly impractical in most cases.
>> >
>> > On Thu, Aug 9, 2012 at 5:54 PM, ziad kamel <zi...@gmail.com>
>> wrote:
>> >
>> >> I see, but we are removing the good recommendations and we are
>> >> assuming that the less preferred items by a user can predict his best
>> >> preferred. For example, a user that has 100 books , and preferred 16
>> >> of them only while the rest are books he have read. By removing the 16
>> >> we are left with 84 books that it seems won't be able to predict the
>> >> right set of 16 ?
>> >>
>> >> What are the recommended approaches to evaluate the results ? I assume
>> >> IR approach is one of them.
>> >>
>> >> Highly appreciating your help Sean .
>> >>
>> >> On Thu, Aug 9, 2012 at 11:45 AM, Sean Owen <sr...@gmail.com> wrote:
>> >> > Yes, or else those items would not be eligible for recommendation.
>> And it
>> >> > would be like giving students the answers to a test before the test.
>> >> >
>> >> > On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <zi...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> A related question please.
>> >> >>
>> >> >> Do Mahout remove the 16% good items before recommending and use the
>> >> >> 84% to predict the 16% ?
>> >> >>
>> >> >>
>> >>
>>



-- 
Lance Norskog
goksron@gmail.com

Re: How good recommendations and precision works

Posted by Sean Owen <sr...@gmail.com>.

Evaluating precision @ 1 is evaluating the 1st recommendation, whether it's
a good recommendation. It's like asking for the data point that a
classifier would classify as most probably in a certain class. That's not
the same as what a classifier is built to do, which is to decide whether
any given item is in a class or not. Those are obviously quite related
questions though.

On Thu, Aug 9, 2012 at 9:20 PM, ziad kamel <zi...@gmail.com> wrote:

> Thanks again.
>
> A quick question , in recommendation , if we measure precision @ 1 ,
> how is that different from measuring precision in a classifier ?  Does
> that mean a recommender becomes a classifier at this case ?
>
>
>
>
> On Thu, Aug 9, 2012 at 12:18 PM, Sean Owen <sr...@gmail.com> wrote:
> > Yes, this is a definite weakness of the precision test as applied to
> > recommenders. It is somewhat flawed; it is easy to apply and has some
> use.
> >
> > Any item the user has interacted with is significant. The less-preferred
> 84
> > still probably predict the most-preferred 16 to some extent. But you
> make a
> > good point, the bottom of the list is of a different nature than the top,
> > and that bias does harm the recommendations, making the test result less
> > useful.
> >
> > This is not a big issue though if the precision@ number is quite small
> > compared to the user pref list size.
> >
> > There's a stronger problem, that the user's pref list is not complete. A
> > recommendation that's not in the list already may still be a good
> > recommendation, in the abstract. But a precision test would count it as
> > "wrong".
> >
> > nDCG is slightly better than precision but still has this fundamental
> > problem.
> >
> > The "real" test is to make recommendations and then put them in front of
> > users somehow and see how many are clicked or acted on. That's the best
> > test but fairly impractical in most cases.
> >
> > On Thu, Aug 9, 2012 at 5:54 PM, ziad kamel <zi...@gmail.com>
> wrote:
> >
> >> I see, but we are removing the good recommendations and we are
> >> assuming that the less preferred items by a user can predict his best
> >> preferred. For example, a user that has 100 books , and preferred 16
> >> of them only while the rest are books he have read. By removing the 16
> >> we are left with 84 books that it seems won't be able to predict the
> >> right set of 16 ?
> >>
> >> What are the recommended approaches to evaluate the results ? I assume
> >> IR approach is one of them.
> >>
> >> Highly appreciating your help Sean .
> >>
> >> On Thu, Aug 9, 2012 at 11:45 AM, Sean Owen <sr...@gmail.com> wrote:
> >> > Yes, or else those items would not be eligible for recommendation.
> And it
> >> > would be like giving students the answers to a test before the test.
> >> >
> >> > On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <zi...@gmail.com>
> >> wrote:
> >> >
> >> >> A related question please.
> >> >>
> >> >> Do Mahout remove the 16% good items before recommending and use the
> >> >> 84% to predict the 16% ?
> >> >>
> >> >>
> >>
>

Re: How good recommendations and precision works

Posted by ziad kamel <zi...@gmail.com>.

Thanks again.

A quick question , in recommendation , if we measure precision @ 1 ,
how is that different from measuring precision in a classifier ?  Does
that mean a recommender becomes a classifier at this case ?




On Thu, Aug 9, 2012 at 12:18 PM, Sean Owen <sr...@gmail.com> wrote:
> Yes, this is a definite weakness of the precision test as applied to
> recommenders. It is somewhat flawed; it is easy to apply and has some use.
>
> Any item the user has interacted with is significant. The less-preferred 84
> still probably predict the most-preferred 16 to some extent. But you make a
> good point, the bottom of the list is of a different nature than the top,
> and that bias does harm the recommendations, making the test result less
> useful.
>
> This is not a big issue though if the precision@ number is quite small
> compared to the user pref list size.
>
> There's a stronger problem, that the user's pref list is not complete. A
> recommendation that's not in the list already may still be a good
> recommendation, in the abstract. But a precision test would count it as
> "wrong".
>
> nDCG is slightly better than precision but still has this fundamental
> problem.
>
> The "real" test is to make recommendations and then put them in front of
> users somehow and see how many are clicked or acted on. That's the best
> test but fairly impractical in most cases.
>
> On Thu, Aug 9, 2012 at 5:54 PM, ziad kamel <zi...@gmail.com> wrote:
>
>> I see, but we are removing the good recommendations and we are
>> assuming that the less preferred items by a user can predict his best
>> preferred. For example, a user that has 100 books , and preferred 16
>> of them only while the rest are books he have read. By removing the 16
>> we are left with 84 books that it seems won't be able to predict the
>> right set of 16 ?
>>
>> What are the recommended approaches to evaluate the results ? I assume
>> IR approach is one of them.
>>
>> Highly appreciating your help Sean .
>>
>> On Thu, Aug 9, 2012 at 11:45 AM, Sean Owen <sr...@gmail.com> wrote:
>> > Yes, or else those items would not be eligible for recommendation. And it
>> > would be like giving students the answers to a test before the test.
>> >
>> > On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <zi...@gmail.com>
>> wrote:
>> >
>> >> A related question please.
>> >>
>> >> Do Mahout remove the 16% good items before recommending and use the
>> >> 84% to predict the 16% ?
>> >>
>> >>
>>

Re: How good recommendations and precision works

Posted by Sean Owen <sr...@gmail.com>.

Yes, this is a definite weakness of the precision test as applied to
recommenders. It is somewhat flawed; it is easy to apply and has some use.

Any item the user has interacted with is significant. The less-preferred 84
still probably predict the most-preferred 16 to some extent. But you make a
good point, the bottom of the list is of a different nature than the top,
and that bias does harm the recommendations, making the test result less
useful.

This is not a big issue though if the precision@ number is quite small
compared to the user pref list size.

There's a stronger problem, that the user's pref list is not complete. A
recommendation that's not in the list already may still be a good
recommendation, in the abstract. But a precision test would count it as
"wrong".

nDCG is slightly better than precision but still has this fundamental
problem.

The "real" test is to make recommendations and then put them in front of
users somehow and see how many are clicked or acted on. That's the best
test but fairly impractical in most cases.

On Thu, Aug 9, 2012 at 5:54 PM, ziad kamel <zi...@gmail.com> wrote:

> I see, but we are removing the good recommendations and we are
> assuming that the less preferred items by a user can predict his best
> preferred. For example, a user that has 100 books , and preferred 16
> of them only while the rest are books he have read. By removing the 16
> we are left with 84 books that it seems won't be able to predict the
> right set of 16 ?
>
> What are the recommended approaches to evaluate the results ? I assume
> IR approach is one of them.
>
> Highly appreciating your help Sean .
>
> On Thu, Aug 9, 2012 at 11:45 AM, Sean Owen <sr...@gmail.com> wrote:
> > Yes, or else those items would not be eligible for recommendation. And it
> > would be like giving students the answers to a test before the test.
> >
> > On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <zi...@gmail.com>
> wrote:
> >
> >> A related question please.
> >>
> >> Do Mahout remove the 16% good items before recommending and use the
> >> 84% to predict the 16% ?
> >>
> >>
>

Re: How good recommendations and precision works

Posted by ziad kamel <zi...@gmail.com>.

I see, but we are removing the good recommendations and we are
assuming that the less preferred items by a user can predict his best
preferred. For example, a user that has 100 books , and preferred 16
of them only while the rest are books he have read. By removing the 16
we are left with 84 books that it seems won't be able to predict the
right set of 16 ?

What are the recommended approaches to evaluate the results ? I assume
IR approach is one of them.

Highly appreciating your help Sean .

On Thu, Aug 9, 2012 at 11:45 AM, Sean Owen <sr...@gmail.com> wrote:
> Yes, or else those items would not be eligible for recommendation. And it
> would be like giving students the answers to a test before the test.
>
> On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <zi...@gmail.com> wrote:
>
>> A related question please.
>>
>> Do Mahout remove the 16% good items before recommending and use the
>> 84% to predict the 16% ?
>>
>>

Re: How good recommendations and precision works

Posted by Sean Owen <sr...@gmail.com>.

Yes, or else those items would not be eligible for recommendation. And it
would be like giving students the answers to a test before the test.

On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <zi...@gmail.com> wrote:

> A related question please.
>
> Do Mahout remove the 16% good items before recommending and use the
> 84% to predict the 16% ?
>
>

Re: How good recommendations and precision works

Posted by ziad kamel <zi...@gmail.com>.

A related question please.

Do Mahout remove the 16% good items before recommending and use the
84% to predict the 16% ?

Many thanks !

On Thu, Aug 9, 2012 at 11:20 AM, ziad kamel <zi...@gmail.com> wrote:
> Thanks Sean !
>
> Please correct me , when selecting the 16% items we use the top items
> , but when comparing with the recommended items we don't use sorted
> list . In other words we just compare 2 lists?
>
> How mahout deal with these 2 cases?
>
> Case 1: user have many items. Assume 1000 item , so if we recommend 5
> good items from the 160 items we will get a precision of 100% ? is
> that ok ?
>
> Case 2: user having less than 7 items. Assume 5 items, in this case
> there won't be top items in the list so the user won't get any
> recommendation and no precision ? Do we need to select another
> threshold like 50% ?
>
>
>
> On Thu, Aug 9, 2012 at 10:52 AM, Sean Owen <sr...@gmail.com> wrote:
>> Hi Ziad, I did answer your last question on this list -- don't see this one
>> previously though.
>>
>> The "relevant" items are chosen as those whose pref value exceed some given
>> threshold. The default threshold is the mean of all 100 pref values plus
>> one standard deviation. Assuming the prefs are about normally distributed
>> about the mean (a significant assumption), and because 84% of the data
>> should therefore fall below mean plus 1 standard deviation, that means you
>> pick about the top 16% (16 of 100) items as relevant.
>>
>> Yes your interpretation of precision is correct.
>>
>> On Thu, Aug 9, 2012 at 4:12 PM, ziad kamel <zi...@gmail.com> wrote:
>>
>>> Hi , I asked this question few months ago with no answer. Hopefully
>>> someone can help .
>>>
>>> When not using a threshold, the default is to use average ratings plus
>>> one standard deviation which equals to 16%. Assume that a user have
>>> 100 items. Does that mean that his good recommendations are the top 16
>>> items ? In case we use precision at 5 , we going to select  only top 5
>>> items from the 100.  So is the precison going to be how many among the
>>> 16 items are in the 5 items ? Assume that we get 4 from the 16 in list
>>> of 5 , the precision will be 80% ?
>>>
>>> IRStatistics stats = evaluator.evaluate(recommenderBuilder, null,
>>> model, null, 5,
>>> GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
>>>
>>> Thanks !
>>>

Re: How good recommendations and precision works

Posted by ziad kamel <zi...@gmail.com>.

Thanks Sean !

Please correct me , when selecting the 16% items we use the top items
, but when comparing with the recommended items we don't use sorted
list . In other words we just compare 2 lists?

How mahout deal with these 2 cases?

Case 1: user have many items. Assume 1000 item , so if we recommend 5
good items from the 160 items we will get a precision of 100% ? is
that ok ?

Case 2: user having less than 7 items. Assume 5 items, in this case
there won't be top items in the list so the user won't get any
recommendation and no precision ? Do we need to select another
threshold like 50% ?



On Thu, Aug 9, 2012 at 10:52 AM, Sean Owen <sr...@gmail.com> wrote:
> Hi Ziad, I did answer your last question on this list -- don't see this one
> previously though.
>
> The "relevant" items are chosen as those whose pref value exceed some given
> threshold. The default threshold is the mean of all 100 pref values plus
> one standard deviation. Assuming the prefs are about normally distributed
> about the mean (a significant assumption), and because 84% of the data
> should therefore fall below mean plus 1 standard deviation, that means you
> pick about the top 16% (16 of 100) items as relevant.
>
> Yes your interpretation of precision is correct.
>
> On Thu, Aug 9, 2012 at 4:12 PM, ziad kamel <zi...@gmail.com> wrote:
>
>> Hi , I asked this question few months ago with no answer. Hopefully
>> someone can help .
>>
>> When not using a threshold, the default is to use average ratings plus
>> one standard deviation which equals to 16%. Assume that a user have
>> 100 items. Does that mean that his good recommendations are the top 16
>> items ? In case we use precision at 5 , we going to select  only top 5
>> items from the 100.  So is the precison going to be how many among the
>> 16 items are in the 5 items ? Assume that we get 4 from the 16 in list
>> of 5 , the precision will be 80% ?
>>
>> IRStatistics stats = evaluator.evaluate(recommenderBuilder, null,
>> model, null, 5,
>> GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
>>
>> Thanks !
>>

Re: How good recommendations and precision works

Posted by Sean Owen <sr...@gmail.com>.

Hi Ziad, I did answer your last question on this list -- don't see this one
previously though.

The "relevant" items are chosen as those whose pref value exceed some given
threshold. The default threshold is the mean of all 100 pref values plus
one standard deviation. Assuming the prefs are about normally distributed
about the mean (a significant assumption), and because 84% of the data
should therefore fall below mean plus 1 standard deviation, that means you
pick about the top 16% (16 of 100) items as relevant.

Yes your interpretation of precision is correct.

On Thu, Aug 9, 2012 at 4:12 PM, ziad kamel <zi...@gmail.com> wrote:

> Hi , I asked this question few months ago with no answer. Hopefully
> someone can help .
>
> When not using a threshold, the default is to use average ratings plus
> one standard deviation which equals to 16%. Assume that a user have
> 100 items. Does that mean that his good recommendations are the top 16
> items ? In case we use precision at 5 , we going to select  only top 5
> items from the 100.  So is the precison going to be how many among the
> 16 items are in the 5 items ? Assume that we get 4 from the 16 in list
> of 5 , the precision will be 80% ?
>
> IRStatistics stats = evaluator.evaluate(recommenderBuilder, null,
> model, null, 5,
> GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
>
> Thanks !
>