You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (JIRA)" <ji...@apache.org> on 2008/05/27 03:55:56 UTC

[jira] Updated: (MAHOUT-59) Create some examples of clustering well-known datasets

     [ https://issues.apache.org/jira/browse/MAHOUT-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Eastman updated MAHOUT-59:
-------------------------------

    Attachment: MAHOUT-59.patch

This patch adds canopy, kmeans and meanshift clustering examples for the http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series dataset, which should be copied into the directory testdata before running the example jobs. I'm still not happy with the arguments for meanshift, and I'm going to work on them to get a nicer result before committing. The canopy and kmeans outputs produce the correct number of clusters (6) but I have not verified that the data are correctly clustered.

> Create some examples of clustering well-known datasets
> ------------------------------------------------------
>
>                 Key: MAHOUT-59
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-59
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Jeff Eastman
>         Attachments: MAHOUT-59.patch
>
>
> The existing unit tests for clustering need to be augmented with examples from the literature which illustrate its correct operation on datasets which have known clusters present. See http://archive.ics.uci.edu/ml/ for some candidate datasets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.