You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (Created) (JIRA)" <ji...@apache.org> on 2012/03/09 21:11:01 UTC

[jira] [Created] (MAHOUT-991) Convert Canopy, MeanShift and Other Tools to Use ClusterWritable

Convert Canopy, MeanShift and Other Tools to Use ClusterWritable
----------------------------------------------------------------

                 Key: MAHOUT-991
                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
             Project: Mahout
          Issue Type: Sub-task
          Components: Clustering
    Affects Versions: 0.6
            Reporter: Jeff Eastman
            Assignee: Jeff Eastman
             Fix For: 0.7


The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. Adjust the Canopy and MeanShift implementations which do not use this approach to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan reassigned MAHOUT-991:
--------------------------------------

    Assignee: Paritosh Ranjan  (was: Jeff Eastman)
    
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Shannon Quinn (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235924#comment-13235924 ] 

Shannon Quinn commented on MAHOUT-991:
--------------------------------------

I suspect it would be good to make this same conversion for the spectral clustering package, too? Within the spirit of getting all the clustering algorithms on similar APIs.
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235983#comment-13235983 ] 

Paritosh Ranjan commented on MAHOUT-991:
----------------------------------------

All junit tests run successfully. I plan to commit this in a day or two. Please suggest if you see any concern.
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235930#comment-13235930 ] 

Paritosh Ranjan commented on MAHOUT-991:
----------------------------------------

SpectralKMeansDriver is using KMeansDriver only in the end for clustering. So, the output format will be similar to KMeans. 

Its also mentioned there as Javadoc
"The output format is the same as the K-means output format".

Is it correct or am I missing something?
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235775#comment-13235775 ] 

Paritosh Ranjan edited comment on MAHOUT-991 at 3/22/12 5:37 PM:
-----------------------------------------------------------------

I just figured out that leaving meanshift alone will create problems. So, will convert meanshift as well before committing.
                
      was (Author: paritoshranjan):
    I just figured out it will. So, will convert meanshift as well before committing.
                  
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Jeff Eastman (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236714#comment-13236714 ] 

Jeff Eastman commented on MAHOUT-991:
-------------------------------------

+1 Paritosh, the changes look like what I was expecting to see. 
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237086#comment-13237086 ] 

Hudson commented on MAHOUT-991:
-------------------------------

Integrated in Mahout-Quality #1408 (See [https://builds.apache.org/job/Mahout-Quality/1408/])
    Mahout-991 Converted K-Means, Canopy, FuzzyKMeans, Dirichlet and MeanShift to emit ClusterWritable. (Revision 1304490)

     Result = SUCCESS
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236640#comment-13236640 ] 

Saikat Kanjilal commented on MAHOUT-991:
----------------------------------------

Paritosh,
Did you already get the fuzzykmeans working, should I not commit anything at this point then?
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236756#comment-13236756 ] 

Paritosh Ranjan commented on MAHOUT-991:
----------------------------------------

Jeff, thanks for reviewing it.
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235730#comment-13235730 ] 

Paritosh Ranjan commented on MAHOUT-991:
----------------------------------------

I am successful in converting all except MeanShift's MR clustering.

Jeff, can meanshift ( both sequential and MR ) be committed separately/later? Do you see any problems in committing MeanShiftCanopyClustering later?
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Shannon Quinn (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235936#comment-13235936 ] 

Shannon Quinn commented on MAHOUT-991:
--------------------------------------

Yes! That's correct. Sorry for the confusion.
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan resolved MAHOUT-991.
------------------------------------

    Resolution: Fixed

Canopy, MeanShift, K-Means, Dirichlet and Fuzzy K Means are emitting ClusterWritable now. 

All the code has been committed.

Resolving the issue.
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235775#comment-13235775 ] 

Paritosh Ranjan commented on MAHOUT-991:
----------------------------------------

I just figured out it will. So, will convert meanshift as well before committing.
                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235920#comment-13235920 ] 

jiraposter@reviews.apache.org commented on MAHOUT-991:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4450/
-----------------------------------------------------------

Review request for mahout.


Summary
-------

Mahout-991 Converted K-Means, Canopy, FuzzyKMeans, Dirichlet and MeanShift to emit ClusterWritable.


This addresses bug Mahout-991.
    https://issues.apache.org/jira/browse/Mahout-991


Diffs
-----

  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyReducer.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java 1302100 
  trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletMapper.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletReducer.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansReducer.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansUtil.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansReducer.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansUtil.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyClusterMapper.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyCreatorMapper.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java 1303903 
  trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyMapper.java 1302085 
  trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyReducer.java 1302085 
  trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 1303474 
  trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java 1302085 
  trunk/core/src/test/java/org/apache/mahout/clustering/fuzzykmeans/TestFuzzyKmeansClustering.java 1302085 
  trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java 1302085 
  trunk/core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java 1303890 
  trunk/integration/src/main/java/org/apache/mahout/clustering/evaluation/ClusterEvaluator.java 1302085 
  trunk/integration/src/main/java/org/apache/mahout/clustering/evaluation/RepresentativePointsDriver.java 1302085 
  trunk/integration/src/main/java/org/apache/mahout/utils/clustering/AbstractClusterWriter.java 1302085 
  trunk/integration/src/main/java/org/apache/mahout/utils/clustering/CSVClusterWriter.java 1302085 
  trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java 1302085 
  trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumperWriter.java 1302085 
  trunk/integration/src/main/java/org/apache/mahout/utils/clustering/ClusterWriter.java 1302085 
  trunk/integration/src/main/java/org/apache/mahout/utils/clustering/GraphMLClusterWriter.java 1302085 
  trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java 1303282 

Diff: https://reviews.apache.org/r/4450/diff


Testing
-------


Thanks,

Paritosh


                
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-991:
-----------------------------------

    Description: 
Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.

The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

  was:The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. Adjust the Canopy and MeanShift implementations which do not use this approach to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.

        Summary: Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable  (was: Convert Canopy, MeanShift and Other Tools to Use ClusterWritable)
    
> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (MAHOUT-991) Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable

Posted by "Paritosh Ranjan (Work started) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on MAHOUT-991 started by Paritosh Ranjan.

> Convert Canopy, MeanShift, K-means, Dirichlet, Fuzzy KMeans and Other Tools to emit ClusterWritable
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-991
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-991
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Adjust the Canopy, MeanShift, K-means, Dirichlet and Fuzzy KMeans implementations to emit ClusterWritables instead of Clusters. Adjust the other clustering tools (ClusterDumper and ClusterEvaluators) to accept ClusterWritables produced by these algorithms.
> The new ClusterIterator and ClusterClassifier uses an expanded sequence file representation that stores Clusters as self-describing ClusterWritable objects. So, once all of these algorithms will start emitting ClusterWritables, then KMeans, Dirichlet and FuzzyK will be able to use ClusterIterator and ClusterClassifier for buildClusters phase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira