You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (JIRA)" <ji...@apache.org> on 2011/08/17 23:07:27 UTC

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

    [ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086595#comment-13086595 ] 

Jeff Eastman commented on MAHOUT-524:
-------------------------------------

The original example was extracting 5 eigenvectors and thus returned 5-d results. I changed it to extract 2 vectors and it used to run but displayed incorrect results.

I'm (still since pre 0.5 testing, IIRC) getting a FileNotFoundException in the bowels of DRM.times while running this in local Hadoop mode. I wonder if it is possible to add a --method sequential implementation for SpectralKMeans to help separate the algorithmetic issues from the file bookkeeping ones?

We have a sequential Lanczos implementation...

Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: File file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data does not exist.
	at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:222)
	at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
	at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:72)
	at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:155)
	at org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:72)
Caused by: java.io.FileNotFoundException: File file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
	at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
	at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
	at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
	at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:214)
	... 4 more


> DisplaySpectralKMeans example fails
> -----------------------------------
>
>                 Key: MAHOUT-524
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.4, 0.5
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>              Labels: clustering, k-means, visualization
>             Fix For: 0.6
>
>         Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard mixture of models data set through spectral k-means. After some tweaking of configuration arguments and a bug fix in EigenCleanupJob it runs spectral k-means to completion. The display example is expecting 2-d clustered points and the example is producing 5-d points. Additional I/O work is needed before this will play with the rest of the clustering algorithms. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira