You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dipti Mathur <di...@gmail.com> on 2011/05/04 15:25:32 UTC

kmeans - Array out of bound.

Hi All,

I get the following error while running the kmeans algorithm with the
reuters dataset. The topic has been discussed in many forums (
http://search.lucidimagination.com/search/document/3f6b06ee9d45b4fe/tranforming_data_for_k_means_analysis)
but no one has mentioned a solution. Anyone faced this and has a solution?

dipti@dipti-laptop:~$ mahout kmeans -i vect-output/tf-vectors/part-r-00000
-k 15 --output kmeans-output --clusters kmeans-output/clusters --maxIter 200
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20.2/
HADOOP_CONF_DIR=/usr/lib/hadoop-0.20.2/conf
11/05/04 18:41:59 INFO common.AbstractJob: Command line arguments:
{--clusters=kmeans-output/clusters, --convergenceDelta=0.5,
--distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
--endPhase=2147483647, --input=vect-output/tf-vectors/part-r-00000,
--maxIter=200, --method=mapreduce, --numClusters=15, --output=kmeans-output,
--startPhase=0, --tempDir=temp}
11/05/04 18:41:59 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
11/05/04 18:41:59 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
11/05/04 18:41:59 INFO compress.CodecPool: Got brand-new compressor
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0,
Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:571)
 at java.util.ArrayList.get(ArrayList.java:349)
at
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:107)
 at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:96)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:54)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Regards,
Dipti Mathur

Re: kmeans - Array out of bound.

Posted by Dipti Mathur <di...@gmail.com>.
Hi All,

Even build-reuters.sh is erroring out! Is there an update to the trunk? The
sh file is attached.

Regards,
Dipti Mathur

On Wed, May 4, 2011 at 6:55 PM, Dipti Mathur <di...@gmail.com> wrote:

> Hi All,
>
> I get the following error while running the kmeans algorithm with the
> reuters dataset. The topic has been discussed in many forums (
> http://search.lucidimagination.com/search/document/3f6b06ee9d45b4fe/tranforming_data_for_k_means_analysis)
> but no one has mentioned a solution. Anyone faced this and has a solution?
>
> dipti@dipti-laptop:~$ mahout kmeans -i vect-output/tf-vectors/part-r-00000
> -k 15 --output kmeans-output --clusters kmeans-output/clusters --maxIter 200
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20.2/
> HADOOP_CONF_DIR=/usr/lib/hadoop-0.20.2/conf
> 11/05/04 18:41:59 INFO common.AbstractJob: Command line arguments:
> {--clusters=kmeans-output/clusters, --convergenceDelta=0.5,
> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
> --endPhase=2147483647, --input=vect-output/tf-vectors/part-r-00000,
> --maxIter=200, --method=mapreduce, --numClusters=15, --output=kmeans-output,
> --startPhase=0, --tempDir=temp}
> 11/05/04 18:41:59 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 11/05/04 18:41:59 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 11/05/04 18:41:59 INFO compress.CodecPool: Got brand-new compressor
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0,
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:571)
>  at java.util.ArrayList.get(ArrayList.java:349)
> at
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:107)
>  at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:96)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:54)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Regards,
> Dipti Mathur
>

Re: kmeans - Array out of bound.

Posted by Grant Ingersoll <gs...@apache.org>.
What do you see when you dump out the values of the input vectors?  Is there actually content going in?

Also, what version of Mahout are you running?



On May 4, 2011, at 9:25 AM, Dipti Mathur wrote:

> Hi All,
> 
> I get the following error while running the kmeans algorithm with the
> reuters dataset. The topic has been discussed in many forums (
> http://search.lucidimagination.com/search/document/3f6b06ee9d45b4fe/tranforming_data_for_k_means_analysis)
> but no one has mentioned a solution. Anyone faced this and has a solution?
> 
> dipti@dipti-laptop:~$ mahout kmeans -i vect-output/tf-vectors/part-r-00000
> -k 15 --output kmeans-output --clusters kmeans-output/clusters --maxIter 200
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20.2/
> HADOOP_CONF_DIR=/usr/lib/hadoop-0.20.2/conf
> 11/05/04 18:41:59 INFO common.AbstractJob: Command line arguments:
> {--clusters=kmeans-output/clusters, --convergenceDelta=0.5,
> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
> --endPhase=2147483647, --input=vect-output/tf-vectors/part-r-00000,
> --maxIter=200, --method=mapreduce, --numClusters=15, --output=kmeans-output,
> --startPhase=0, --tempDir=temp}
> 11/05/04 18:41:59 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 11/05/04 18:41:59 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 11/05/04 18:41:59 INFO compress.CodecPool: Got brand-new compressor
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0,
> Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:571)
> at java.util.ArrayList.get(ArrayList.java:349)
> at
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:107)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:96)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:54)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> Regards,
> Dipti Mathur

--------------------------
Grant Ingersoll
Lucene Revolution -- Lucene and Solr User Conference
May 25-26 in San Francisco
www.lucenerevolution.org