You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (JIRA)" <ji...@apache.org> on 2011/08/17 22:49:27 UTC

[jira] [Resolved] (MAHOUT-626) T1 and T2 Values in Canopy (& MeanShift)

     [ https://issues.apache.org/jira/browse/MAHOUT-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Eastman resolved MAHOUT-626.
---------------------------------

    Resolution: Fixed

I've stewed about whether or not to try this with MeanShiftCanopy and decided it is not appropriate to change these values between the mapper and reducer. MeanShift is an iterative algorithm and these changes would vascillate between mapper & reducer values in a way that is not reflected in the algorithm as I understand it.

> T1 and T2 Values in Canopy (& MeanShift) 
> -----------------------------------------
>
>                 Key: MAHOUT-626
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-626
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4, 0.5
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.6
>
>         Attachments: CanopyT3T4.patch
>
>
> Users are reporting that the T1 and T2 threshold values which work in sequential mode don't work as well in the mapreduce mode because both the mapper and reducer are using the same values. The effect of coalescing a number of points into a single centroid done by the mapper changes the distances enough that independent threshold values are needed in the reducer. 
> Here is a patch which implements optional T3 and T4 threshold values which are only used by the canopy reducer. Convenience methods have been added for API compatibility and defaults included so that these values will default to T1 and T2. A new unit test confirms the thresholds are being set correctly.
> If this works out as a positive improvement, I will make the same changes to MeanShift and commit them

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira