You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Shashikant Kore (JIRA)" <ji...@apache.org> on 2009/08/06 08:59:14 UTC

[jira] Updated: (MAHOUT-160) ClusterDumper utility to output all the clusters in all sequence files and points

     [ https://issues.apache.org/jira/browse/MAHOUT-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shashikant Kore updated MAHOUT-160:
-----------------------------------

    Attachment: mahout-160.patch

ClusterDumper utility has been  modified to take the clusters and points directory as input instead of sequence file and points file.

> ClusterDumper utility to output all the clusters in all sequence files and points
> ---------------------------------------------------------------------------------
>
>                 Key: MAHOUT-160
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-160
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Shashikant Kore
>         Attachments: mahout-160.patch
>
>
> The current ClusterDumper utility takes a sequence file and points file as input and prints the cluster vector along with the points that belong to the clusters in the sequence file. This utility doesn't produce correct results in case there are multiple sequence files and points. 
> To avoid this problem, all the point to cluster mappings need to be read first and then iterate on the sequence files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.