You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Rahul Mishra <mi...@gmail.com> on 2012/10/01 07:06:22 UTC
Clustering: significance of s0,s1,s2?
In the clustering code, what actually is the significance of s0, s1
and s2? Apologies if it is a
dumb question but I do not find any comments in the code?
--
Regards,
Rahul K Mishra,
www.ee.iitb.ac.in/student/~rahulkmishra
Re: Clustering: significance of s0,s1,s2?
Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Variables s0, s1 and s2 are for a running-sums algorithm that is used to
compute the new center and radius (centroid and standard deviation) for
Clusters at the end of each iteration. It is basically the
RunningSumsGaussianAccumulator's implementation that is yet to be
factored into a GaussianAccumulator instance so that an
OnlineGaussianAccumulator can be substituted. The OGA is based upon
Welford's algorithm and is more numerically stable for calculating the
std (radius).
A JIRA issue to accomplish this refactoring and a patch to do it would
be a great contribution for some aspiring Mahout developer.
On 10/1/12 1:06 AM, Rahul Mishra wrote:
> In the clustering code, what actually is the significance of s0, s1
> and s2? Apologies if it is a
> dumb question but I do not find any comments in the code?
>
>
> --
> Regards,
> Rahul K Mishra,
> www.ee.iitb.ac.in/student/~rahulkmishra
>
>