You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (JIRA)" <ji...@apache.org> on 2011/08/17 22:31:27 UTC

[jira] [Issue Comment Edited] (MAHOUT-596) Testing if the weight assigned to points when calling the observe method in AbstractCluster incorrectly affect the number of points in a cluster

    [ https://issues.apache.org/jira/browse/MAHOUT-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086569#comment-13086569 ] 

Jeff Eastman edited comment on MAHOUT-596 at 8/17/11 8:30 PM:
--------------------------------------------------------------

I've looked at the tests and considered the question. If observing a point, P, ten times with weight=1 is equivalent to observing P once, with weight=10, then I think the implementation is correct. The only applications of non-unity observations are in FuzzyKMeans, where the weights across all the clusters sum to 1, and in MeanShiftCanopy where the weights are the sums of all the points ever agglomerated by a cluster. Seems right to me.

      was (Author: jeastman):
    I've looked at the tests and considered the question. If observing 10 points, P, with weight=1 is equivalent to observing P once, with weight=10, then I think the implementation is correct. The only applications of non-unity observations are in FuzzyKMeans, where the weights across all the clusters sum to 1, and in MeanShiftCanopy where the weights are the sums of all the points ever agglomerated by a cluster. Seems right to me.
  
> Testing if the weight assigned to points when calling the observe method in AbstractCluster incorrectly affect the number of points in a cluster 
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-596
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-596
>             Project: Mahout
>          Issue Type: Test
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Yuval Merhav
>            Assignee: Jeff Eastman
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: TestAbstractCluster.java
>
>
> See the observe method in AbstractCluster:
> public void observe(Vector x, double weight) {
>     s0 += weight;
>     Vector weightedX = x.times(weight);
>  ....
>   }
> And then the computeParameters method:
>  
> public void computeParameters() {
>     ...
>  numPoints = (int) s0;
>     ...
> }
> So if someone changes the weight from the default value 1.0, it affects the number of points in the cluster. It does not
> however affect the centroid (which I'm not sure if that's correct or not -- depends on what the author meant to use the weight for).
> I attached a few simple test cases that fail. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira