You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Ken Williams <zo...@hotmail.com> on 2011/06/01 13:35:17 UTC

Heap Size question.


  Hi All,

  I'm a bit confused about the values displayed on the 'jobtracker.jsp' page.
  In particular, there's a section called 'Cluster Summary'.

  I'm running a small 4-machine Hadoop cluster, and when I point a web-browser 
  at my master machine (http://master:50030/jobtracker.jsp) it displays,

               Cluster Summary (Heap Size is 15.5 MB / 1.89 GB)

   What exactly do these figures mean ?

   I know that the second figure (1.89 GB) is determined by the value of 
   the HADOOP_HEAPSIZE variable set in  'conf/hadoop-env.sh'. What I'm not 
   sure about is exactly what it means, or where the first value (15.5 MB) is determined
   or what it means.

   I'm guessing the 1.89 GB is the amount of heap-memory allocated to Hadoop
   on each machine in the cluster. (Correct ?)

   I have no idea what the 15.5 MB means or where it comes from. It never changes,
   not even when a job is running, and I can't find any explanation in the documentation. 

   This page, https://issues.apache.org/jira/browse/HADOOP-4435, seems to suggest
   that the 15.5 MB should be the amount of heap memory currently in use but since
   this value never changes - not even when a job is running and I refresh the page 
   - I'm not convinced this is working.   

   I'm asking this question because I have a Mahout job which slowly comes to halt with
   a lot of 'OutOfMemoryError: Java heap space' errors, before it is 'Killed'.

   I'm using Hadoop 0.20.2 and the latest Mahout snapshot version.

   Thanks for any help.

           Ken

Re: Heap Size question.

Posted by Joey Echeverria <jo...@cloudera.com>.

The values show you the maximum heap size and currently used heap of the job tracker, not running jobs. Furthermore, the HADOOP_HEAPSIZE setting only sets the maximum heap for the daemons, not the tasks in your job.

If you're getting OOMEs, you should add a setting to your mapred-site.xml file that looks like this:

<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx1g</value>
</property>

-Joey

On Jun 1, 2011, at 7:35, Ken Williams <zo...@hotmail.com> wrote:

> 
> 
>  Hi All,
> 
>  I'm a bit confused about the values displayed on the 'jobtracker.jsp' page.
>  In particular, there's a section called 'Cluster Summary'.
> 
>  I'm running a small 4-machine Hadoop cluster, and when I point a web-browser 
>  at my master machine (http://master:50030/jobtracker.jsp) it displays,
> 
>               Cluster Summary (Heap Size is 15.5 MB / 1.89 GB)
> 
>   What exactly do these figures mean ?
> 
>   I know that the second figure (1.89 GB) is determined by the value of 
>   the HADOOP_HEAPSIZE variable set in  'conf/hadoop-env.sh'. What I'm not 
>   sure about is exactly what it means, or where the first value (15.5 MB) is determined
>   or what it means.
> 
>   I'm guessing the 1.89 GB is the amount of heap-memory allocated to Hadoop
>   on each machine in the cluster. (Correct ?)
> 
>   I have no idea what the 15.5 MB means or where it comes from. It never changes,
>   not even when a job is running, and I can't find any explanation in the documentation. 
> 
>   This page, https://issues.apache.org/jira/browse/HADOOP-4435, seems to suggest
>   that the 15.5 MB should be the amount of heap memory currently in use but since
>   this value never changes - not even when a job is running and I refresh the page 
>   - I'm not convinced this is working.   
> 
>   I'm asking this question because I have a Mahout job which slowly comes to halt with
>   a lot of 'OutOfMemoryError: Java heap space' errors, before it is 'Killed'.
> 
>   I'm using Hadoop 0.20.2 and the latest Mahout snapshot version.
> 
>   Thanks for any help.
> 
>           Ken
> 
>

Re: Heap Size question.

Posted by Paul Mahon <pm...@decarta.com>.

That's a hadoop question rather than a mahout question. For a full 
answer you'll probably want to try the hadoop list.

That number is the heap for the jobtracker, for tracking which jobs 
are running, which have run, not the heap for the job driver or tasks 
themselves. The out of memory errors are likely to be in on tasks, and 
are not related to that cluster summary heap size number.

On 06/01/2011 04:35 AM, Ken Williams wrote:
>
>    Hi All,
>
>    I'm a bit confused about the values displayed on the 'jobtracker.jsp' page.
>    In particular, there's a section called 'Cluster Summary'.
>
>    I'm running a small 4-machine Hadoop cluster, and when I point a web-browser
>    at my master machine (http://master:50030/jobtracker.jsp) it displays,
>
>                 Cluster Summary (Heap Size is 15.5 MB / 1.89 GB)
>
>     What exactly do these figures mean ?
>
>     I know that the second figure (1.89 GB) is determined by the value of
>     the HADOOP_HEAPSIZE variable set in  'conf/hadoop-env.sh'. What I'm not
>     sure about is exactly what it means, or where the first value (15.5 MB) is determined
>     or what it means.
>
>     I'm guessing the 1.89 GB is the amount of heap-memory allocated to Hadoop
>     on each machine in the cluster. (Correct ?)
>
>     I have no idea what the 15.5 MB means or where it comes from. It never changes,
>     not even when a job is running, and I can't find any explanation in the documentation.
>
>     This page, https://issues.apache.org/jira/browse/HADOOP-4435, seems to suggest
>     that the 15.5 MB should be the amount of heap memory currently in use but since
>     this value never changes - not even when a job is running and I refresh the page
>     - I'm not convinced this is working.
>
>     I'm asking this question because I have a Mahout job which slowly comes to halt with
>     a lot of 'OutOfMemoryError: Java heap space' errors, before it is 'Killed'.
>
>     I'm using Hadoop 0.20.2 and the latest Mahout snapshot version.
>
>     Thanks for any help.
>
>             Ken
>
>

Re: OutOfMemoryError: GC overhead limit exceeded

Posted by hadoopman <ha...@gmail.com>.

I've run into similar problems in my hive jobs and will look at the 
'mapred.child.ulimit' option.  One thing that we've found is when 
loading data with insert overwrite into our hive tables we've needed to 
include a 'CLUSTER BY' or 'DISTRIBUTE BY' option.  Generally that's 
fixed our memory issues during the reduce phase.  But not 100% of the 
time (but close).

I understand the basics as to what those options do but I'm unclear as 
to "why" they are necessary (coming from an Oracle and Postgres DBA 
background).  I'm guessing it has to do something with the underlying code.



On 06/18/2011 12:28 PM, Mapred Learn wrote:
> Did u try playing with mapred.child.ulimit along with java.opts ?
>
> Sent from my iPhone
>
> On Jun 18, 2011, at 9:55 AM, Ken Williams<zo...@hotmail.com>  wrote:
>
>    
>> Hi All,
>>
>> I'm having a problem running a job on Hadoop. Using Mahout, I've been able to run several Bayesian classifiers and train and test them successfully on increasingly large datasets. Now I'm working on a dataset of 100,000 documents (size 100MB). I've trained the classifier on 80,000 docs and am using the remaining 20,000 as the test set. I've been able to train the classifier but when I try to 'testclassifier' all the map tasks are failing with a 'Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded' exception, before the job itself is 'Killed'. I have a small cluster of 3 machines but have plenty of memory and CPU power (3 x 16GB, 2.5GHz quad-core machines).
>> I've tried setting 'mapred.child.java.opts' flags up to 3GB in size (-Xms3G -Xmx3G) but still get the same error. I've also tried setting HADOOP_HEAPSIZE at values like 2000, 2500 and 3000 but this made no difference. When the program is running I can use 'top' to see that although the CPUs are busy, memory usage rarely goes above 12GB and absolutely no swapping is taking place. (see Program console output: http://pastebin.com/0m2Uduxa, Job config file: http://pastebin.com/4GEFSnUM).
>> I found a similar problem with a 'GC overhead limit exceeded' where the program was spending so much time garbage-collecting (more then 90% of its time!) that it was unable to progress and so threw the 'GC overhead limit exceeded' exception.  If I set (-XX:-UseGCOverheadLimit) in the 'mapred.child.java.opts' property to avoid this exception then I see the same behaviour as before only a slightly different exception is thrown,   Caused by: java.lang.OutOfMemoryError: Java heap space     at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39)
>> So I'm guessing that maybe my program is spending too much time garbage-collecting for it to progress ? But how do I fix this ? There's no further info in the log-files other than seeing the exceptions being thrown. I tried to reduce the 'dfs.block.size' parameter to reduce the amount of data going into each 'map' process (and therefore reduce it's memory requirements) but this made no difference. I tried various settings for JVM reuse (mapred.job.reuse.jvm.num.tasks)using values for no re-use (0), limited re-use (10), and unlimited re-use (-1) but no difference. I think the problem is in the job configuration parameters but how do I find it ? I'm using Hadoop 0.20.2 and the latest Mahout snapshot version. All machines are running 64-bit Ubuntu and Java 6.Any help would be very much appreciated,
>>
>>            Ken Williams
>>
>>
>>
>>
>>
>>
>>
>>
>>      
>

Re: OutOfMemoryError: GC overhead limit exceeded

Posted by Mapred Learn <ma...@gmail.com>.

Did u try playing with mapred.child.ulimit along with java.opts ?

Sent from my iPhone

On Jun 18, 2011, at 9:55 AM, Ken Williams <zo...@hotmail.com> wrote:

> 
> Hi All,
> 
> I'm having a problem running a job on Hadoop. Using Mahout, I've been able to run several Bayesian classifiers and train and test them successfully on increasingly large datasets. Now I'm working on a dataset of 100,000 documents (size 100MB). I've trained the classifier on 80,000 docs and am using the remaining 20,000 as the test set. I've been able to train the classifier but when I try to 'testclassifier' all the map tasks are failing with a 'Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded' exception, before the job itself is 'Killed'. I have a small cluster of 3 machines but have plenty of memory and CPU power (3 x 16GB, 2.5GHz quad-core machines).
> I've tried setting 'mapred.child.java.opts' flags up to 3GB in size (-Xms3G -Xmx3G) but still get the same error. I've also tried setting HADOOP_HEAPSIZE at values like 2000, 2500 and 3000 but this made no difference. When the program is running I can use 'top' to see that although the CPUs are busy, memory usage rarely goes above 12GB and absolutely no swapping is taking place. (see Program console output: http://pastebin.com/0m2Uduxa, Job config file: http://pastebin.com/4GEFSnUM).
> I found a similar problem with a 'GC overhead limit exceeded' where the program was spending so much time garbage-collecting (more then 90% of its time!) that it was unable to progress and so threw the 'GC overhead limit exceeded' exception.  If I set (-XX:-UseGCOverheadLimit) in the 'mapred.child.java.opts' property to avoid this exception then I see the same behaviour as before only a slightly different exception is thrown,   Caused by: java.lang.OutOfMemoryError: Java heap space     at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39)
> So I'm guessing that maybe my program is spending too much time garbage-collecting for it to progress ? But how do I fix this ? There's no further info in the log-files other than seeing the exceptions being thrown. I tried to reduce the 'dfs.block.size' parameter to reduce the amount of data going into each 'map' process (and therefore reduce it's memory requirements) but this made no difference. I tried various settings for JVM reuse (mapred.job.reuse.jvm.num.tasks)using values for no re-use (0), limited re-use (10), and unlimited re-use (-1) but no difference. I think the problem is in the job configuration parameters but how do I find it ? I'm using Hadoop 0.20.2 and the latest Mahout snapshot version. All machines are running 64-bit Ubuntu and Java 6.Any help would be very much appreciated,
> 
>           Ken Williams
> 
> 
> 
> 
> 
> 
> 
>

Re: OutOfMemoryError: GC overhead limit exceeded

Posted by Paul Mahon <pm...@decarta.com>.

You could try setting the child.java.opts to something quite large, 
perhaps 10 GB. You may wish to try setting MAHOUT_HEAPSIZE as well. 
But I don't think that'll make a difference since it's the hadoop 
tasks which are getting the error.

You may also be able to set:
-Dmapred.min.split.size=20000000 -Dmapred.map.tasks=100000000

Which may make Hadoop create more map tasks... it did for me.




On 06/18/2011 09:55 AM, Ken Williams wrote:
> Hi All,
>
> I'm having a problem running a job on Hadoop. Using Mahout, I've been able to run several Bayesian classifiers and train and test them successfully on increasingly large datasets. Now I'm working on a dataset of 100,000 documents (size 100MB). I've trained the classifier on 80,000 docs and am using the remaining 20,000 as the test set. I've been able to train the classifier but when I try to 'testclassifier' all the map tasks are failing with a 'Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded' exception, before the job itself is 'Killed'. I have a small cluster of 3 machines but have plenty of memory and CPU power (3 x 16GB, 2.5GHz quad-core machines).
> I've tried setting 'mapred.child.java.opts' flags up to 3GB in size (-Xms3G -Xmx3G) but still get the same error. I've also tried setting HADOOP_HEAPSIZE at values like 2000, 2500 and 3000 but this made no difference. When the program is running I can use 'top' to see that although the CPUs are busy, memory usage rarely goes above 12GB and absolutely no swapping is taking place. (see Program console output: http://pastebin.com/0m2Uduxa, Job config file: http://pastebin.com/4GEFSnUM).
> I found a similar problem with a 'GC overhead limit exceeded' where the program was spending so much time garbage-collecting (more then 90% of its time!) that it was unable to progress and so threw the 'GC overhead limit exceeded' exception.  If I set (-XX:-UseGCOverheadLimit) in the 'mapred.child.java.opts' property to avoid this exception then I see the same behaviour as before only a slightly different exception is thrown,   Caused by: java.lang.OutOfMemoryError: Java heap space     at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39)
> So I'm guessing that maybe my program is spending too much time garbage-collecting for it to progress ? But how do I fix this ? There's no further info in the log-files other than seeing the exceptions being thrown. I tried to reduce the 'dfs.block.size' parameter to reduce the amount of data going into each 'map' process (and therefore reduce it's memory requirements) but this made no difference. I tried various settings for JVM reuse (mapred.job.reuse.jvm.num.tasks)using values for no re-use (0), limited re-use (10), and unlimited re-use (-1) but no difference. I think the problem is in the job configuration parameters but how do I find it ? I'm using Hadoop 0.20.2 and the latest Mahout snapshot version. All machines are running 64-bit Ubuntu and Java 6.Any help would be very much appreciated,
>
>             Ken Williams
>
>
>
>
>
>
>
>

OutOfMemoryError: GC overhead limit exceeded

Posted by Ken Williams <zo...@hotmail.com>.

Hi All,

I'm having a problem running a job on Hadoop. Using Mahout, I've been able to run several Bayesian classifiers and train and test them successfully on increasingly large datasets. Now I'm working on a dataset of 100,000 documents (size 100MB). I've trained the classifier on 80,000 docs and am using the remaining 20,000 as the test set. I've been able to train the classifier but when I try to 'testclassifier' all the map tasks are failing with a 'Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded' exception, before the job itself is 'Killed'. I have a small cluster of 3 machines but have plenty of memory and CPU power (3 x 16GB, 2.5GHz quad-core machines).
I've tried setting 'mapred.child.java.opts' flags up to 3GB in size (-Xms3G -Xmx3G) but still get the same error. I've also tried setting HADOOP_HEAPSIZE at values like 2000, 2500 and 3000 but this made no difference. When the program is running I can use 'top' to see that although the CPUs are busy, memory usage rarely goes above 12GB and absolutely no swapping is taking place. (see Program console output: http://pastebin.com/0m2Uduxa, Job config file: http://pastebin.com/4GEFSnUM).
I found a similar problem with a 'GC overhead limit exceeded' where the program was spending so much time garbage-collecting (more then 90% of its time!) that it was unable to progress and so threw the 'GC overhead limit exceeded' exception. If I set (-XX:-UseGCOverheadLimit) in the 'mapred.child.java.opts' property to avoid this exception then I see the same behaviour as before only a slightly different exception is thrown, Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39)
So I'm guessing that maybe my program is spending too much time garbage-collecting for it to progress ? But how do I fix this ? There's no further info in the log-files other than seeing the exceptions being thrown. I tried to reduce the 'dfs.block.size' parameter to reduce the amount of data going into each 'map' process (and therefore reduce it's memory requirements) but this made no difference. I tried various settings for JVM reuse (mapred.job.reuse.jvm.num.tasks)using values for no re-use (0), limited re-use (10), and unlimited re-use (-1) but no difference. I think the problem is in the job configuration parameters but how do I find it ? I'm using Hadoop 0.20.2 and the latest Mahout snapshot version. All machines are running 64-bit Ubuntu and Java 6.Any help would be very much appreciated,

Ken Williams

OutOfMemoryError: GC overhead limit exceeded

Posted by Ken Williams <zo...@hotmail.com>.

Hi All,

Ken Williams

RE: OutOfMemoryError: GC overhead limit exceeded

Posted by Ken Williams <zo...@hotmail.com>.

Hi All,