You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (JIRA)" <ji...@apache.org> on 2011/08/17 23:07:27 UTC
[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086595#comment-13086595 ]
Jeff Eastman commented on MAHOUT-524:
-------------------------------------
The original example was extracting 5 eigenvectors and thus returned 5-d results. I changed it to extract 2 vectors and it used to run but displayed incorrect results.
I'm (still since pre 0.5 testing, IIRC) getting a FileNotFoundException in the bowels of DRM.times while running this in local Hadoop mode. I wonder if it is possible to add a --method sequential implementation for SpectralKMeans to help separate the algorithmetic issues from the file bookkeeping ones?
We have a sequential Lanczos implementation...
Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: File file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data does not exist.
at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:222)
at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:72)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:155)
at org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:72)
Caused by: java.io.FileNotFoundException: File file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200)
at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:214)
... 4 more
> DisplaySpectralKMeans example fails
> -----------------------------------
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.4, 0.5
> Reporter: Jeff Eastman
> Assignee: Jeff Eastman
> Labels: clustering, k-means, visualization
> Fix For: 0.6
>
> Attachments: aff.txt, raw.txt, spectralkmeans.png
>
>
> I've committed a new display example that attempts to push the standard mixture of models data set through spectral k-means. After some tweaking of configuration arguments and a bug fix in EigenCleanupJob it runs spectral k-means to completion. The display example is expecting 2-d clustered points and the example is producing 5-d points. Additional I/O work is needed before this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira