You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by je...@lewi.us on 2011/02/19 00:05:48 UTC

clusterdump out of memory error

Greetings,

I used kmeans to cluster ~3million instances of 40-d vectors. The  
clustering ran fine but when I ran the cluster dump utility I got the  
memory error below. I initially ran everything locally, but after  
getting the memory error I tried running it under hadoop in pseudo  
distributed mode (I'm running cloudera).

I have r1066213 of Mahout.
Java is 1.6.0_23


Jeremy

/usr/local/programs/svn_mahout/bin/mahout clusterdump --seqFileDir  
kmeans_work/cluster-9 --pointsDir kmeans_work/clusteredPoints --output  
kmeans_work/clusteranalyze-9.txt
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20
No HADOOP_CONF_DIR set, using /usr/lib/hadoop-0.20/conf
11/02/18 14:34:50 INFO common.AbstractJob: Command line arguments:  
{--dictionaryType=text, --endPhase=2147483647,  
--output=kmeans_work/clusteranalyze-9.txt,  
--pointsDir=kmeans_work/clusteredPoints,  
--seqFileDir=kmeans_work/cluster-9, --startPhase=0, --tempDir=temp}
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead  
limit exceeded
	at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:44)
	at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:39)
	at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:94)
	at  
org.apache.mahout.clustering.WeightedVectorWritable.readFields(WeightedVectorWritable.java:55)
	at  
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1758)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1886)
	at  
org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:286)
	at  
org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:224)
	at  
org.apache.mahout.utils.clustering.ClusterDumper.run(ClusterDumper.java:143)
	at  
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:104)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at  
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at  
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at  
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at  
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at  
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:186)

Re: clusterdump out of memory error

Posted by Jeremy Lewi <je...@lewi.us>.

If I leave out the --pointsDir option it will work. But I need to know
the cluster assignments for each point.

J 

On Fri, 2011-02-18 at 16:05 -0700, jeremy@lewi.us wrote:
> Greetings,
> 
> I used kmeans to cluster ~3million instances of 40-d vectors. The  
> clustering ran fine but when I ran the cluster dump utility I got the  
> memory error below. I initially ran everything locally, but after  
> getting the memory error I tried running it under hadoop in pseudo  
> distributed mode (I'm running cloudera).
> 
> I have r1066213 of Mahout.
> Java is 1.6.0_23
> 
> 
> Jeremy
> 
> /usr/local/programs/svn_mahout/bin/mahout clusterdump --seqFileDir  
> kmeans_work/cluster-9 --pointsDir kmeans_work/clusteredPoints --output  
> kmeans_work/clusteranalyze-9.txt
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20
> No HADOOP_CONF_DIR set, using /usr/lib/hadoop-0.20/conf
> 11/02/18 14:34:50 INFO common.AbstractJob: Command line arguments:  
> {--dictionaryType=text, --endPhase=2147483647,  
> --output=kmeans_work/clusteranalyze-9.txt,  
> --pointsDir=kmeans_work/clusteredPoints,  
> --seqFileDir=kmeans_work/cluster-9, --startPhase=0, --tempDir=temp}
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead  
> limit exceeded
> 	at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:44)
> 	at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:39)
> 	at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:94)
> 	at  
> org.apache.mahout.clustering.WeightedVectorWritable.readFields(WeightedVectorWritable.java:55)
> 	at  
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1758)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1886)
> 	at  
> org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:286)
> 	at  
> org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:224)
> 	at  
> org.apache.mahout.utils.clustering.ClusterDumper.run(ClusterDumper.java:143)
> 	at  
> org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:104)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at  
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at  
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at  
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at  
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> 
>