You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (JIRA)" <ji...@apache.org> on 2009/06/20 21:22:07 UTC

[jira] Commented: (MAHOUT-137) Convert Clustering Algs to use Vector Writable

    [ https://issues.apache.org/jira/browse/MAHOUT-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722246#action_12722246 ] 

Jeff Eastman commented on MAHOUT-137:
-------------------------------------

MAHOUT-136 changed Canopy to use Writable between map and reduce steps, but input and output formats are still Text. In the interests of consistency and efficiency, it makes sense to convert all of the clustering jobs to use Writables for I/O too. We can have a separate utility job to convert from Writable form to Json or other textual representations if that is needed. Since most clustering jobs will have an input step to prepare the points for clustering anyway, having this output Writables vs Text would be a small change.

> Convert Clustering Algs to use Vector Writable
> ----------------------------------------------
>
>                 Key: MAHOUT-137
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-137
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>
> All M/R jobs should use Vector writable instead of encoding and decoding strings.  We can have a separate utility that converts serialized GSON, Strings, whatever into the appropriate vectors.  See MAHOUT-136 and http://www.lucidimagination.com/search/document/6a55f260826fd77f/jira_commented_mahout_136_change_canopy_mr_implementation_to_use_vector_writable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.