You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by syed kather <in...@gmail.com> on 2012/10/03 21:14:48 UTC

Heap Space Problem while running in cluster in map reduce

Team,
  When i am trying to run KMean clustering i had found it is throwing
  Java heap space
         at
org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434)
         at
org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
         at
org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:139)
         at
org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:118)
         at
org.apache.mahout.clustering.ClusterObservations.readFields(ClusterObservations.java:59)
         at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
         at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
         at
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
         at
org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
         at
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
         at
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:25)
         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
         at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1502)
         at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2768)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2706)

Can i know what may be the reason.

    I have 5 Node cluster
    Master      4 Core with 16GB RAM
    salve1       4 Core with  8GB RAM
    salve2       4 Core with  8GB RAM
    salve3       4 Core with  8GB RAM
    salve4       4 Core with  8GB RAM

Let me know if there is any optimization is required for this

Advance Thanks
            Thanks and Regards,
        S SYED ABDUL KATHER

Re: Heap Space Problem while running in cluster in map reduce

Posted by paritosh ranjan <pa...@gmail.com>.

How many initial clusters are you providing to KMeans?
Try reducing the initial number of clusters and find out the breaking
point. A good way would be to find initial number of clusters from Canopy
Clustering.https://cwiki.apache.org/MAHOUT/canopy-clustering.html

Have you analyzed the nodes of the cluster, whether they are using 16 GB of
RAM or not? If not, then the hadoop cluster configuration would need some
reconfiguration so that it can use most of the available RAM.

On Thu, Oct 4, 2012 at 12:44 AM, syed kather <in...@gmail.com> wrote:

> Team,
>   When i am trying to run KMean clustering i had found it is throwing
>   Java heap space
>          at
>
> org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434)
>          at
>
> org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>          at
>
> org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:139)
>          at
> org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:118)
>          at
>
> org.apache.mahout.clustering.ClusterObservations.readFields(ClusterObservations.java:59)
>          at
>
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>          at
>
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>          at
>
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
>          at
>
> org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
>          at
>
> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
>          at
>
> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:25)
>          at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>          at
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1502)
>          at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2768)
>         at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2706)
>
> Can i know what may be the reason.
>
>     I have 5 Node cluster
>     Master      4 Core with 16GB RAM
>     salve1       4 Core with  8GB RAM
>     salve2       4 Core with  8GB RAM
>     salve3       4 Core with  8GB RAM
>     salve4       4 Core with  8GB RAM
>
> Let me know if there is any optimization is required for this
>
> Advance Thanks
>             Thanks and Regards,
>         S SYED ABDUL KATHER
>