You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (JIRA)" <ji...@apache.org> on 2009/09/29 03:05:16 UTC

[jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable

    [ https://issues.apache.org/jira/browse/MAHOUT-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760462#action_12760462 ] 

Jeff Eastman commented on MAHOUT-136:
-------------------------------------

I think this issue has been completed and should be closed, since Canopy does now use Vector Writable for communicating the centroid vectors between the mapper and reducer. What it does not do, is transmit Writable Canopies between the map and reduce steps as kmeans does. There is an implementation of Writable methods for Canopy (IMHO it is not correct since it sets the point total and count to nonzero values) but the mapper and reducer do not use them so this is moot. Converting the mapper and reducer to communicate writable canopies can be done but there are a lot of annoying little complications in the driver which currently goes to some lengths to use the same vector form (dense, sparse) as the input data.

It works as implemented.

Unless somebody strongly disagrees I'm going to close this issue as resolved, since the real intent was to replace the text representation of the centroid vector with the writable version and that has been done for some time now.

> Change Canopy MR Implementation to use Vector Writable
> ------------------------------------------------------
>
>                 Key: MAHOUT-136
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-136
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.2
>
>
> Internal serialization of Canopy currently uses asFormatString rather than just making the Canopy writable. This is storage inefficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.