You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org> on 2011/12/17 16:48:30 UTC

[jira] [Commented] (MAHOUT-929) Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning

    [ https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171593#comment-13171593 ] 

Paritosh Ranjan commented on MAHOUT-929:
----------------------------------------

I think that it would be difficult to manage discussions and patches for all the three issues ( points mentioned ) in this single Jira issue. 

In agile's context also, this user story is big and trying to do too many things.

Would it be good to create three sub issues for the three points mentioned, as they are related? I think there is also an order in developing them, so, it would also be good to make sub issues dependent on each other (in order). If you agree, then we can create them.
                
> Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-929
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-929
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>             Fix For: 0.7
>
>
> The current clustering drivers have a -cp option to produce clusteredPoints directory containing the input vectors classified by the final clusters produced by the algorithm. These options are redundantly implemented in those drivers.
> - Factor out & implement an independent post processor to perform the classification step independently of the various clustering implementations.
> - Implement a pluggable outlier removal capability for this classifier. 
> - Consider building off of the ClusterClassifier & ClusterIterator ideas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira