You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Rahul Mishra <mi...@gmail.com> on 2012/10/01 07:06:22 UTC

Clustering: significance of s0,s1,s2?

In the clustering code, what actually is the significance of s0, s1
and s2? Apologies if it is a
dumb question but I do not find any comments in the code?


--
Regards,
Rahul K Mishra,
www.ee.iitb.ac.in/student/~rahulkmishra

Re: Clustering: significance of s0,s1,s2?

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Variables s0, s1 and s2 are for a running-sums algorithm that is used to 
compute the new center and radius (centroid and standard deviation) for 
Clusters at the end of each iteration. It is basically the 
RunningSumsGaussianAccumulator's implementation that is yet to be 
factored into a GaussianAccumulator instance so that an 
OnlineGaussianAccumulator can be substituted. The OGA is based upon 
Welford's algorithm and is more numerically stable for calculating the 
std (radius).

A JIRA issue to accomplish this refactoring and a patch to do it would 
be a great contribution for some aspiring Mahout developer.

On 10/1/12 1:06 AM, Rahul Mishra wrote:
> In the clustering code, what actually is the significance of s0, s1
> and s2? Apologies if it is a
> dumb question but I do not find any comments in the code?
>
>
> --
> Regards,
> Rahul K Mishra,
> www.ee.iitb.ac.in/student/~rahulkmishra
>
>