You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by yamo93 <ya...@gmail.com> on 2012/10/29 15:59:39 UTC

Question on weighted similarities

Hi all,

I have a question on the formula used for weighted similarities in the 
class AbstractSimilarity.

I expected to find a simple percentage, as
       double scaleFactor = (double) count / (double) (num + 1);
       return result * scaleFactor;

But the the code is more complex.

What are the benefits of this approach ?

Rgds,
Yann.

Re: Question on weighted similarities

Posted by Sean Owen <sr...@gmail.com>.

The general idea is that as count grows, it should push the result away
from 0 and towards 1. Or it needs to move towards -1, if the result is
negative. It needs to stay in the range [-1,1] too. I think those last two
explain 80% of the apparent extra fuss there, and is why a simple multiply
wouldn't quite work.

I imagine you could write a different, slightly simpler, and possibly more
principled formulation that still matches those goals. The weighting system
is a little arbitrary.

On Mon, Oct 29, 2012 at 2:59 PM, yamo93 <ya...@gmail.com> wrote:

> Hi all,
>
> I have a question on the formula used for weighted similarities in the
> class AbstractSimilarity.
>
> I expected to find a simple percentage, as
>       double scaleFactor = (double) count / (double) (num + 1);
>       return result * scaleFactor;
>
> But the the code is more complex.
>
> What are the benefits of this approach ?
>
> Rgds,
> Yann.
>