You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Zia mel <zi...@gmail.com> on 2013/01/15 17:51:09 UTC

Choosing precision

Hello,

If I have users that have items between 1-20 , what would be the ideal
way to evaluate the recommender using precisoion? Is there any
recommended precision to choose such as  p@2 , p@5 p@10 or others and
why?

Many thanks

Re: Choosing precision

Posted by Zia mel <zi...@gmail.com>.

ok I found an answer ,
AverageAbsoluteDifferenceRecommenderEvaluator is MAE :)
http://people.apache.org/~isabel/mahout_site/mahout-core/apidocs/org/apache/mahout/cf/taste/impl/eval/AverageAbsoluteDifferenceRecommenderEvaluator.html


On Tue, Jan 15, 2013 at 4:02 PM, Zia mel <zi...@gmail.com> wrote:
> Correction: For the second code I meant mean average error (MAE)
> //**MAE code
>   protected void processOneEstimate(float estimatedPreference,
> Preference realPref) {
>     double diff = realPref.getValue() - estimatedPreference;
>     average.addDatum(diff );
>   }
>
>   @Override
>   protected double computeFinalEvaluation() {
>     return average.getAverage();
>   }
>
> On Tue, Jan 15, 2013 at 12:03 PM, Zia mel <zi...@gmail.com> wrote:
>> //**MAP code
>>   protected void processOneEstimate(float estimatedPreference,
>> Preference realPref) {
>>     double diff = realPref.getValue() - estimatedPreference;
>>     average.addDatum(diff );
>>   }
>>
>>   @Override
>>   protected double computeFinalEvaluation() {
>>     return average.getAverage();
>>   }

Re: Choosing precision

Posted by Zia mel <zi...@gmail.com>.

Correction: For the second code I meant mean average error (MAE)
//**MAE code
  protected void processOneEstimate(float estimatedPreference,
Preference realPref) {
    double diff = realPref.getValue() - estimatedPreference;
    average.addDatum(diff );
  }

  @Override
  protected double computeFinalEvaluation() {
    return average.getAverage();
  }

On Tue, Jan 15, 2013 at 12:03 PM, Zia mel <zi...@gmail.com> wrote:
> //**MAP code
>   protected void processOneEstimate(float estimatedPreference,
> Preference realPref) {
>     double diff = realPref.getValue() - estimatedPreference;
>     average.addDatum(diff );
>   }
>
>   @Override
>   protected double computeFinalEvaluation() {
>     return average.getAverage();
>   }

Re: Choosing precision

Posted by Zia mel <zi...@gmail.com>.

Amazing answer !

What about these measures that appear a lot when evaluating ? Are they
implemented in Mahout?
Mean Average Precision (MAP)
Mean Reciprocal Rank (MRR)


Since we have RMS , would this code give a correct answer for MAP

//** RMS code
public final class RMSRecommenderEvaluator extends
AbstractDifferenceRecommenderEvaluator {
  protected void processOneEstimate(float estimatedPreference,
Preference realPref) {
    double diff = realPref.getValue() - estimatedPreference;
    average.addDatum(diff * diff);
  }

  @Override
  protected double computeFinalEvaluation() {
    return Math.sqrt(average.getAverage());
  }


//**MAP code
  protected void processOneEstimate(float estimatedPreference,
Preference realPref) {
    double diff = realPref.getValue() - estimatedPreference;
    average.addDatum(diff );
  }

  @Override
  protected double computeFinalEvaluation() {
    return average.getAverage();
  }

Have a nice day Sean :)

On Tue, Jan 15, 2013 at 11:17 AM, Sean Owen <sr...@gmail.com> wrote:
> The best tests are really from real users. A/B test different
> recommenders and see which has better performance. That's not quite
> practical though.
>
> The problem is that you don't even know what the best recommendations
> are. Splitting the data by date is reasonable, but recent items aren't
> necessarily most-liked. Splitting by rating is more reasonable on this
> point, but you still can't conclude that there aren't better
> recommendations from among the un-rated items.
>
> Still it out to correlate. I think you will find precision/recall are
> very low in most cases, often a few percent. The result is "noisy".
> AUC will tell you about where all of those "best recommendations" in
> the test set fell into the list, rather than only measuring the top
> N's performance. This tells you more, and I think that's generally
> good. However it is measuring performance over the entire list of
> recs, when you are unlikely to use more than the top N.
>
> Go ahead and use it since there's not a lot better you can do in the
> lab, but be aware of the issues.

Re: Choosing precision

Posted by Sean Owen <sr...@gmail.com>.

The best tests are really from real users. A/B test different
recommenders and see which has better performance. That's not quite
practical though.

The problem is that you don't even know what the best recommendations
are. Splitting the data by date is reasonable, but recent items aren't
necessarily most-liked. Splitting by rating is more reasonable on this
point, but you still can't conclude that there aren't better
recommendations from among the un-rated items.

Still it out to correlate. I think you will find precision/recall are
very low in most cases, often a few percent. The result is "noisy".
AUC will tell you about where all of those "best recommendations" in
the test set fell into the list, rather than only measuring the top
N's performance. This tells you more, and I think that's generally
good. However it is measuring performance over the entire list of
recs, when you are unlikely to use more than the top N.

Go ahead and use it since there's not a lot better you can do in the
lab, but be aware of the issues.

Re: Choosing precision

Posted by Zia mel <zi...@gmail.com>.

Thanks Sean. What do you recommend for evaluating the recommendations?


On Tue, Jan 15, 2013 at 11:08 AM, Sean Owen <sr...@gmail.com> wrote:
> here

Re: Choosing precision

Posted by Sean Owen <sr...@gmail.com>.

Precision is not a great metric for recommenders, but it exists. There
is no best value here; I would choose something that mirrors how you
will use the results. If you show top 3 recs, use 3.

On Tue, Jan 15, 2013 at 4:51 PM, Zia mel <zi...@gmail.com> wrote:
> Hello,
>
> If I have users that have items between 1-20 , what would be the ideal
> way to evaluate the recommender using precisoion? Is there any
> recommended precision to choose such as  p@2 , p@5 p@10 or others and
> why?
>
> Many thanks