You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paritosh Ranjan (Created) (JIRA)" <ji...@apache.org> on 2012/02/23 08:51:51 UTC

[jira] [Created] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Refactor Canopy Clustering into a separate post process with outlier pruning
----------------------------------------------------------------------------

                 Key: MAHOUT-982
                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
             Project: Mahout
          Issue Type: Sub-task
          Components: Clustering
    Affects Versions: 0.6
            Reporter: Paritosh Ranjan
            Assignee: Paritosh Ranjan
             Fix For: 0.7


Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223992#comment-13223992 ] 

Paritosh Ranjan commented on MAHOUT-982:
----------------------------------------

I plan to commit this in a day or two. Please object if you see any concern.

Then I will do similar refactorings for FuzzyK, KMeans and Dirichlet.
                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226906#comment-13226906 ] 

Hudson commented on MAHOUT-982:
-------------------------------

Integrated in Mahout-Quality #1388 (See [https://builds.apache.org/job/Mahout-Quality/1388/])
    MAHOUT-982, Added method and CLI option to remove outliers. (Revision 1299207)

     Result = SUCCESS
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1299207
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/commandline/DefaultOptionCreator.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReaderTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorTest.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java

                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan resolved MAHOUT-982.
------------------------------------

    Resolution: Fixed

The clustering has been refactored to a separate process and their is a suppor for outlier pruning now. All the code is committed. 

Resolving the issue.
                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222169#comment-13222169 ] 

jiraposter@reviews.apache.org commented on MAHOUT-982:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4174/
-----------------------------------------------------------

Review request for mahout.


Summary
-------

Executing clustering using ClusterClassificationDriver in CanopyDriver.

This replaces the existing funtionality. If this refactoring is marked ok, then we can add a threshold as the method parameter/CLI argument to support oulier removal in CanopyClustering.
This patch is first of its kind for the ClusteringDrivers. If this is okayed, then the similar refactoring can be done easily for KMeans, FuzzyK and Dirichlet.


This addresses bug MAHOUT-982.
    https://issues.apache.org/jira/browse/MAHOUT-982


Diffs
-----

  trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 1294137 
  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/ClusterMapper.java 1294137 
  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyClusterer.java 1294137 
  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 1294137 

Diff: https://reviews.apache.org/r/4174/diff


Testing
-------


Thanks,

Paritosh


                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by Isabel Drost <is...@apache.org>.
On 23.02.2012 Jeff Eastman wrote:
> Only JIRA committers can be assigned an issue but anybody can
> contribute.

Just for reference: Our JIRA features a role "contributor" - issues can be 
assigned to anyone who has this role. For any Mahout JIRA admin (should be any 
committer AFAIK) making someone "contributor" is as simple as going to the 
project administration page, click on members and add the respective user name 
to that role.

Isabel

Re: [jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by Paritosh Ranjan <pr...@xebia.com>.
I have assigned all the stories related to clustering refactoring to myself.
If anyone wants to contribute to the refactoring stories, please feel 
free to submit the patch and drop a comment on the JIRA issue.

On 23-02-2012 18:02, Jeff Eastman wrote:
> Only JIRA committers can be assigned an issue but anybody can 
> contribute. Paritosh, since you are riding herd on these refactorings 
> and will most likely be committing the patches, why don't you assign 
> them to yourself?
>
> On 2/23/12 2:25 AM, Paritosh Ranjan (Commented) (JIRA) wrote:
>>      [ 
>> https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214494#comment-13214494 
>> ]
>>
>> Paritosh Ranjan commented on MAHOUT-982:
>> ----------------------------------------
>>
>> Suneel, thanks for this initiative.
>> Please feel free to assign it to yourself ( if its possible ) or to 
>> submit patches.
>>
>>> Refactor Canopy Clustering into a separate post process with outlier 
>>> pruning
>>> ---------------------------------------------------------------------------- 
>>>
>>>
>>>                  Key: MAHOUT-982
>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-982
>>>              Project: Mahout
>>>           Issue Type: Sub-task
>>>           Components: Clustering
>>>     Affects Versions: 0.6
>>>             Reporter: Paritosh Ranjan
>>>             Assignee: Paritosh Ranjan
>>>               Labels: clustering
>>>              Fix For: 0.7
>>>
>>>
>>> Use ClusterClassificationDriver to refactor clustering out of 
>>> CanopyDriver with outlier pruning support.
>> -- 
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA 
>> administrators: 
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: 
>> http://www.atlassian.com/software/jira
>>
>>
>>
>>
>


Re: [jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Only JIRA committers can be assigned an issue but anybody can 
contribute. Paritosh, since you are riding herd on these refactorings 
and will most likely be committing the patches, why don't you assign 
them to yourself?

On 2/23/12 2:25 AM, Paritosh Ranjan (Commented) (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214494#comment-13214494 ]
>
> Paritosh Ranjan commented on MAHOUT-982:
> ----------------------------------------
>
> Suneel, thanks for this initiative.
> Please feel free to assign it to yourself ( if its possible ) or to submit patches.
>
>> Refactor Canopy Clustering into a separate post process with outlier pruning
>> ----------------------------------------------------------------------------
>>
>>                  Key: MAHOUT-982
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-982
>>              Project: Mahout
>>           Issue Type: Sub-task
>>           Components: Clustering
>>     Affects Versions: 0.6
>>             Reporter: Paritosh Ranjan
>>             Assignee: Paritosh Ranjan
>>               Labels: clustering
>>              Fix For: 0.7
>>
>>
>> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
>


[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214494#comment-13214494 ] 

Paritosh Ranjan commented on MAHOUT-982:
----------------------------------------

Suneel, thanks for this initiative. 
Please feel free to assign it to yourself ( if its possible ) or to submit patches. 
                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-982:
-----------------------------------

    Attachment: MAHOUT-982.txt

Jeff, will you please review this patch?

Implemented ClusterClassification Driver in CanopyDriver to clusterData. 

This replaces the existing funtionality. If this is ok, then we can add a threshold as the method parameter/CLI argument to support oulier removal in CanopyClustering.
                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Jeff Eastman (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225326#comment-13225326 ] 

Jeff Eastman commented on MAHOUT-982:
-------------------------------------

+1 I like the way the driver was compressed and the mapper disappeared. Also less time-consuming unit tests since classification is tested on its own. 
                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225324#comment-13225324 ] 

Paritosh Ranjan commented on MAHOUT-982:
----------------------------------------

I plan to commit this in a day or two. If you see any concern, please suggest. I have changed the signature of CanopyDriver.run by adding an argument clusterClassificationThreshold. The default is 0.0 and its not mandatory.
                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Suneel Marthi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214453#comment-13214453 ] 

Suneel Marthi commented on MAHOUT-982:
--------------------------------------

Paritosh, I can take a crack at this, please assign this to me.
                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225319#comment-13225319 ] 

jiraposter@reviews.apache.org commented on MAHOUT-982:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4245/
-----------------------------------------------------------

Review request for mahout.


Summary
-------

Added outlier removal capability to Canopy Clustering.


This addresses bug Mahout-982.
    https://issues.apache.org/jira/browse/Mahout-982


Diffs
-----

  trunk/core/src/main/java/org/apache/mahout/common/commandline/DefaultOptionCreator.java 1294137 
  trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 1298406 
  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 1298408 
  trunk/core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java 1294454 
  trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java 1294137 
  trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReaderTest.java 1294137 
  trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterOutputPostProcessorTest.java 1294137 
  trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java 1294137 
  trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java 1294137 
  trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java 1294137 
  trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java 1294137 
  trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java 1294137 
  trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java 1294137 

Diff: https://reviews.apache.org/r/4245/diff


Testing
-------

Added test cases for both sequential and mapreduce version.


Thanks,

Paritosh


                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-982.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216532#comment-13216532 ] 

Paritosh Ranjan commented on MAHOUT-982:
----------------------------------------

Successful in using ClusterClassificationDriver to clusterData for Canopy Clustering. Have tried it for sequential version for now. Still need to do it for the mapreduce version. 
                
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-982) Refactor Canopy Clustering into a separate post process with outlier pruning

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-982:
---------------------------------

    Comment: was deleted

(was: Paritosh, I can take a crack at this, please assign this to me.)
    
> Refactor Canopy Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-982
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-982
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of CanopyDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira