You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by yamo93 <ya...@gmail.com> on 2012/10/29 15:59:39 UTC
Question on weighted similarities
Hi all,
I have a question on the formula used for weighted similarities in the
class AbstractSimilarity.
I expected to find a simple percentage, as
double scaleFactor = (double) count / (double) (num + 1);
return result * scaleFactor;
But the the code is more complex.
What are the benefits of this approach ?
Rgds,
Yann.
Re: Question on weighted similarities
Posted by Sean Owen <sr...@gmail.com>.
The general idea is that as count grows, it should push the result away
from 0 and towards 1. Or it needs to move towards -1, if the result is
negative. It needs to stay in the range [-1,1] too. I think those last two
explain 80% of the apparent extra fuss there, and is why a simple multiply
wouldn't quite work.
I imagine you could write a different, slightly simpler, and possibly more
principled formulation that still matches those goals. The weighting system
is a little arbitrary.
On Mon, Oct 29, 2012 at 2:59 PM, yamo93 <ya...@gmail.com> wrote:
> Hi all,
>
> I have a question on the formula used for weighted similarities in the
> class AbstractSimilarity.
>
> I expected to find a simple percentage, as
> double scaleFactor = (double) count / (double) (num + 1);
> return result * scaleFactor;
>
> But the the code is more complex.
>
> What are the benefits of this approach ?
>
> Rgds,
> Yann.
>