You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (Resolved) (JIRA)" <ji...@apache.org> on 2011/11/15 19:55:51 UTC

[jira] [Resolved] (MAHOUT-887) Bottom Up Clustering

     [ https://issues.apache.org/jira/browse/MAHOUT-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Eastman resolved MAHOUT-887.
---------------------------------

    Resolution: Invalid
      Assignee: Jeff Eastman

In general, top-down clustering begins with all points assigned to a single cluster and then iteratively uses some algorithm to split them. Bottom-up clustering starts with one cluster for each point and then uses some algorithm to iteratively merge them. Both of these approaches have scalability challenges due to all the bookkeeping required and really break down if a probabilistic cluster assignment (e.g. fuzzyk/dirichlet) is needed.

You can search the mail archive and JIRAs for MSC to find these discussions. The scalability issues involve the requirement to use a single reducer (for the last iteration at least) and cluster growth due to retaining the ids of all the clusters that have merged with it.

MAHOUT-843 is aimed at supporting heterogeneous, top-down, hierarchical clustering where the choice of algorithm at every level is up to the user and where each algorithm may itself be iterative. That's a bit different than the homogeneous, top-down clustering I described above. As clustering algorithms cannot be used to merge clusters, there is no way to use them to build heterogeneous, bottom-up clusterers which would be the opposite of 843.

I agree this issue can be closed.
                
> Bottom Up Clustering
> --------------------
>
>                 Key: MAHOUT-887
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-887
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: Linux Windows
>            Reporter: Paritosh Ranjan
>            Assignee: Jeff Eastman
>              Labels: features
>             Fix For: 0.6
>
>
> Bottom up clustering is achieved by starting with small clusters/single points and then merging clusters recursively which are closer than a specified control constraint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira