You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paul Hubenig (JIRA)" <ji...@apache.org> on 2012/09/25 23:56:07 UTC

[jira] [Created] (MAHOUT-1077) apparent spectral kmeans bug

Paul Hubenig created MAHOUT-1077:
------------------------------------

             Summary: apparent spectral kmeans bug
                 Key: MAHOUT-1077
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1077
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.7
            Reporter: Paul Hubenig


Using example data at:  https://cwiki.apache.org/MAHOUT/spectral-clustering.html

0,0,0
0,1,0.8
0,2,0.5
1,0,0.8
1,1,0
1,2,0.9
2,0,0.5
2,1,0.9
2,2,0

Using 0.7 distribution.

mahout spectralkmeans -i file:///Users/phubenig/affExGraph.txt -o file:///Users/phubenig/spectralEx -k 2 -d 3 -x 30 -cd 0.01   -ow

12/09/05 16:14:00 INFO mapred.JobClient:     Combine output records=1
12/09/05 16:14:00 INFO mapred.JobClient:     Reduce output records=1
12/09/05 16:14:00 INFO mapred.JobClient:     Map output records=1
12/09/05 16:14:00 INFO lanczos.LanczosSolver: 2 passes through the corpus so far...
Exception in thread "main" org.apache.mahout.math.IndexException: Index 2 is outside allowable range of [0,2)
    at org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)
    at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:132)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:73)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:148)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:53)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1077) apparent spectral kmeans bug

Posted by "Daniel Davies (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481067#comment-13481067 ] 

Daniel Davies commented on MAHOUT-1077:
---------------------------------------

I'm seeing exactly the same bug on a dataset with 936 points.  The LanczosSolver crashes attempting to access an array element that's one larger than the desired number of clusters specified by the -k switch.  The problem was also noted by Dan Brinkley as noted in this mailing list message:

From	Dan Brickley <da...@danbri.org>
Subject	Spectral Kmeans wiki category data test - can you confirm if you ran it to completion?
Date	Thu, 21 Jun 2012 19:54:58 GMT

                
> apparent spectral kmeans bug
> ----------------------------
>
>                 Key: MAHOUT-1077
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1077
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Paul Hubenig
>
> Using example data at:  https://cwiki.apache.org/MAHOUT/spectral-clustering.html
> 0,0,0
> 0,1,0.8
> 0,2,0.5
> 1,0,0.8
> 1,1,0
> 1,2,0.9
> 2,0,0.5
> 2,1,0.9
> 2,2,0
> Using 0.7 distribution.
> mahout spectralkmeans -i file:///Users/phubenig/affExGraph.txt -o file:///Users/phubenig/spectralEx -k 2 -d 3 -x 30 -cd 0.01   -ow
> 12/09/05 16:14:00 INFO mapred.JobClient:     Combine output records=1
> 12/09/05 16:14:00 INFO mapred.JobClient:     Reduce output records=1
> 12/09/05 16:14:00 INFO mapred.JobClient:     Map output records=1
> 12/09/05 16:14:00 INFO lanczos.LanczosSolver: 2 passes through the corpus so far...
> Exception in thread "main" org.apache.mahout.math.IndexException: Index 2 is outside allowable range of [0,2)
>     at org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)
>     at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:132)
>     at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:73)
>     at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:148)
>     at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:53)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira