You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2012/03/05 06:37:00 UTC

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222169#comment-13222169 ] 

jiraposter@reviews.apache.org commented on MAHOUT-982:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4174/
-----------------------------------------------------------

Review request for mahout.


Summary
-------

Executing clustering using ClusterClassificationDriver in CanopyDriver.

This replaces the existing funtionality. If this refactoring is marked ok, then we can add a threshold as the method parameter/CLI argument to support oulier removal in CanopyClustering.
This patch is first of its kind for the ClusteringDrivers. If this is okayed, then the similar refactoring can be done easily for KMeans, FuzzyK and Dirichlet.


This addresses bug MAHOUT-982.
    https://issues.apache.org/jira/browse/MAHOUT-982


Diffs
-----

  trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 1294137 
  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/ClusterMapper.java 1294137 
  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyClusterer.java 1294137 
  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 1294137 

Diff: https://reviews.apache.org/r/4174/diff


Testing
-------


Thanks,

Paritosh


                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira