You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Zia mel <zi...@gmail.com> on 2013/01/23 14:01:21 UTC

Finding best NearestNUserNeighborhood size

Hi
I used NearestNUserNeighborhood with RMSE in a user recommender that
use PearsonCorrelationSimilarity , I found that changing the
neighborhood size has no clear pattern or effect. Sometimes it
increase others decrease. While using the neighborhood size with
precision has a better pattern. Any reason? Another point is that the
RMSE change for every run since it choose different sample , so would
running the code for 10 or 20 times and taking the average be a good
idea or there is better thing to do?

//-- RUN 1
 2,  0.5523623146152608
 3,  0.5425283201773704
 4,  0.669846658662311
 5,  0.5956616542334392
 6,  0.6033911039809353
 7,  0.6135206544496685
 8,  0.5740444208649034
 9,  0.642798288443049
 10,  0.6266535555651472

//-- RUN 2
 2,  0.5415411343523825
 3,  0.6784589323396696
 4,  0.6347069968141124
 5,  0.6968820296725008
 6,  0.5953849874479478
 7,  0.6791828191904128
 8,  0.6072462830257853
 9,  0.6461346217476011
 10,  0.6043919119341171

Thanks !

Re: Finding best NearestNUserNeighborhood size

Posted by Sean Owen <sr...@gmail.com>.

That is good for making a test repeatable because you are picking the same
random sample repeatedly. For evaluation purposes here that's not a good
thing and you do want several actually different samples of the result.
On Jan 23, 2013 1:19 PM, "Stevo Slavić" <ss...@gmail.com> wrote:

> When evaluating recommender before running evaluator put
>
> RandomUtils.useTestSeed();
>
> to make splitting of data set consistent; don't use it in production, just
> for evaluation.
> This is all explained more thoroughly in Mahout in Action book.
>
> Kind regards,
> Stevo Slavic.
>
>
> On Wed, Jan 23, 2013 at 2:01 PM, Zia mel <zi...@gmail.com> wrote:
>
> > Hi
> > I used NearestNUserNeighborhood with RMSE in a user recommender that
> > use PearsonCorrelationSimilarity , I found that changing the
> > neighborhood size has no clear pattern or effect. Sometimes it
> > increase others decrease. While using the neighborhood size with
> > precision has a better pattern. Any reason? Another point is that the
> > RMSE change for every run since it choose different sample , so would
> > running the code for 10 or 20 times and taking the average be a good
> > idea or there is better thing to do?
> >
> > //-- RUN 1
> >  2,  0.5523623146152608
> >  3,  0.5425283201773704
> >  4,  0.669846658662311
> >  5,  0.5956616542334392
> >  6,  0.6033911039809353
> >  7,  0.6135206544496685
> >  8,  0.5740444208649034
> >  9,  0.642798288443049
> >  10,  0.6266535555651472
> >
> > //-- RUN 2
> >  2,  0.5415411343523825
> >  3,  0.6784589323396696
> >  4,  0.6347069968141124
> >  5,  0.6968820296725008
> >  6,  0.5953849874479478
> >  7,  0.6791828191904128
> >  8,  0.6072462830257853
> >  9,  0.6461346217476011
> >  10,  0.6043919119341171
> >
> > Thanks !
> >
>

Re: Finding best NearestNUserNeighborhood size

Posted by Stevo Slavić <ss...@gmail.com>.

When evaluating recommender before running evaluator put

RandomUtils.useTestSeed();

to make splitting of data set consistent; don't use it in production, just
for evaluation.
This is all explained more thoroughly in Mahout in Action book.

Kind regards,
Stevo Slavic.


On Wed, Jan 23, 2013 at 2:01 PM, Zia mel <zi...@gmail.com> wrote:

> Hi
> I used NearestNUserNeighborhood with RMSE in a user recommender that
> use PearsonCorrelationSimilarity , I found that changing the
> neighborhood size has no clear pattern or effect. Sometimes it
> increase others decrease. While using the neighborhood size with
> precision has a better pattern. Any reason? Another point is that the
> RMSE change for every run since it choose different sample , so would
> running the code for 10 or 20 times and taking the average be a good
> idea or there is better thing to do?
>
> //-- RUN 1
>  2,  0.5523623146152608
>  3,  0.5425283201773704
>  4,  0.669846658662311
>  5,  0.5956616542334392
>  6,  0.6033911039809353
>  7,  0.6135206544496685
>  8,  0.5740444208649034
>  9,  0.642798288443049
>  10,  0.6266535555651472
>
> //-- RUN 2
>  2,  0.5415411343523825
>  3,  0.6784589323396696
>  4,  0.6347069968141124
>  5,  0.6968820296725008
>  6,  0.5953849874479478
>  7,  0.6791828191904128
>  8,  0.6072462830257853
>  9,  0.6461346217476011
>  10,  0.6043919119341171
>
> Thanks !
>

Re: Finding best NearestNUserNeighborhood size

Posted by Sean Owen <sr...@gmail.com>.

The stochastic nature of the evaluation means your results will vary
randomly from run to run. This looks to my eyeballs like most of the
variation you see. You probably want to average over many runs.

You will probably find that accuracy peaks around some neighborhood size:
adding more useful neighbors helps but at some point the next nearest isn't
so similar and the additional data harms the result more than helps.
On Jan 23, 2013 1:01 PM, "Zia mel" <zi...@gmail.com> wrote:

> Hi
> I used NearestNUserNeighborhood with RMSE in a user recommender that
> use PearsonCorrelationSimilarity , I found that changing the
> neighborhood size has no clear pattern or effect. Sometimes it
> increase others decrease. While using the neighborhood size with
> precision has a better pattern. Any reason? Another point is that the
> RMSE change for every run since it choose different sample , so would
> running the code for 10 or 20 times and taking the average be a good
> idea or there is better thing to do?
>
> //-- RUN 1
>  2,  0.5523623146152608
>  3,  0.5425283201773704
>  4,  0.669846658662311
>  5,  0.5956616542334392
>  6,  0.6033911039809353
>  7,  0.6135206544496685
>  8,  0.5740444208649034
>  9,  0.642798288443049
>  10,  0.6266535555651472
>
> //-- RUN 2
>  2,  0.5415411343523825
>  3,  0.6784589323396696
>  4,  0.6347069968141124
>  5,  0.6968820296725008
>  6,  0.5953849874479478
>  7,  0.6791828191904128
>  8,  0.6072462830257853
>  9,  0.6461346217476011
>  10,  0.6043919119341171
>
> Thanks !
>