You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dan Brickley (Commented) (JIRA)" <ji...@apache.org> on 2012/02/11 21:35:00 UTC

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

    [ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206256#comment-13206256 ] 

Dan Brickley commented on MAHOUT-524:
-------------------------------------

I just tried spectral k-means with some wikipedia/dbpedia data (1.0 affinities for every page and topic category URL pair in the Wiki. Data came from http://downloads.dbpedia.org/3.7/en/article_categories_en.nt.bz2 and is dropped in the Web at http://danbri.org/2012/spectral/dbpedia/ (I posted .csv plus an int-to-URL dictionary file).

My best guess at commandline (running this w/ today's trunk + a fresh 0.20.203.0 hadoop pseudo-cluster) was this:

mahout spectralkmeans -i wiki/ -o output1 -k 20 -d 4192499 --maxIter 10    (where hdfs wiki/ subdir contains the .csv data file)

Unfortunately I'm hitting one of the various problems discussed above. If anyone else can reproduce this, perhaps a fresh JIRA is needed.

It gets stuck after the first job, with an essentially empty seqfile. Full transcript here: https://gist.github.com/1804016

(checked with "mahout seqdumper --seqFile output1/calculations/diagonal/part-r-00000")

This is essentially the same experience I had back in Sept (see above) running a similar test. 
                
> DisplaySpectralKMeans example fails
> -----------------------------------
>
>                 Key: MAHOUT-524
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.4, 0.5
>            Reporter: Jeff Eastman
>            Assignee: Shannon Quinn
>              Labels: clustering, k-means, visualization
>             Fix For: 0.6
>
>         Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard mixture of models data set through spectral k-means. After some tweaking of configuration arguments and a bug fix in EigenCleanupJob it runs spectral k-means to completion. The display example is expecting 2-d clustered points and the example is producing 5-d points. Additional I/O work is needed before this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira