You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2009/08/06 16:57:14 UTC

[jira] Assigned: (MAHOUT-160) ClusterDumper utility to output all the clusters in all sequence files and points

     [ https://issues.apache.org/jira/browse/MAHOUT-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned MAHOUT-160:
--------------------------------------

    Assignee: Grant Ingersoll

> ClusterDumper utility to output all the clusters in all sequence files and points
> ---------------------------------------------------------------------------------
>
>                 Key: MAHOUT-160
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-160
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Shashikant Kore
>            Assignee: Grant Ingersoll
>         Attachments: mahout-160-dict.patch, mahout-160.patch
>
>
> The current ClusterDumper utility takes a sequence file and points file as input and prints the cluster vector along with the points that belong to the clusters in the sequence file. This utility doesn't produce correct results in case there are multiple sequence files and points. 
> To avoid this problem, all the point to cluster mappings need to be read first and then iterate on the sequence files.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.