You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org> on 2011/12/07 07:10:40 UTC

[jira] [Updated] (MAHOUT-843) Top Down Clustering

     [ https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-843:
-----------------------------------

    Attachment: MAHOUT-843-patch-only-postprocessor-final

Added option for xm.

Revisited all Javadocs. All public methods have proper javadocs now. Private method have javadocs as per need.

Revisited failing tests which were happening due to clusters-0-final change. Fixed all which I can find. Please fix others before committing, if you are able to find more.

Thanks for the help for addOption...xm. I was not able to figure that out :).
                
> Top Down Clustering
> -------------------
>
>                 Key: MAHOUT-843
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-843
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Jeff Eastman
>              Labels: clustering, patch
>             Fix For: 0.6
>
>         Attachments: MAHOUT-843-patch, MAHOUT-843-patch-only-postprocessor, MAHOUT-843-patch-only-postprocessor-final, MAHOUT-843-patch-only-postprocessor-v1, MAHOUT-843-patch-only-postprocessor-v2, MAHOUT-843-patch-only-postprocessor-v3, MAHOUT-843-patch-only-postprocessor-v4, MAHOUT-843-patch-only-postprocessor-v5, MAHOUT-843-patch-v1, Top-Down-Clustering-patch
>
>
> Top Down Clustering works in multiple steps. The first step is to find comparative bigger clusters. The second step is to cluster the bigger chunks into meaningful clusters. This can performance while clustering big amount of data. And, it also removes the dependency of providing input clusters/numbers to the clustering algorithm.
> The "big" is a relative term, as well as the smaller "meaningful" terms. So, the control of this "bigger" and "smaller/meaningful" clusters will be controlled by the user.
> Which clustering algorithm to be used in the top level and which to use in the bottom level can also be selected by the user. Initially, it can be done for only one/few clustering algorithms, and later, option can be provided to use all the algorithms ( which suits the case ). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira