You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Emmanuel JOKE <jo...@gmail.com> on 2007/06/30 19:32:10 UTC

OutOfMemory

Hi,

I tried to update my db, using the following command:
 bin/nutch updatedb crawld/crawldb crawld/segments/20070628095836

and my 2 nodes had an error and i can see the following exception:
2007-06-30 12:24:29,688 INFO  mapred.TaskInProgress - Error from
task_0001_m_000000_1: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java
:94)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.Text.write(Text.java:243)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
MapTask.java:316)
        at org.apache.nutch.crawl.CrawlDbFilter.map(CrawlDbFilter.java:99)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
:1445)


My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
What is the best practice ?

Do you any idea if they are a bug ? or is it just my conf which is not
correct ?

Thanks for your help

Re: OutOfMemory

Posted by Ted Dunning <td...@veoh.com>.

If you are using machines with only 512MB of memory, it is probably a very
bad idea to set minimum help size so large.

-Xms400M might be more appropriate.

I should say, though that if you have a program that is worth using hadoop
on, you have a problem that is worth having more memory on each processor.
Most of the work I do benefits more from memory than from processor, at
least up to >1-2GB RAM.

On 6/30/07 11:51 AM, "Avinash Lakshman" <al...@facebook.com> wrote:

> There is an element in the config for Java params. Set it to -Xms1024M
> and give it a shot. It is definitely seems like a case of you running
> out of heap space.
> 
> A
> -----Original Message-----
> From: Emmanuel JOKE [mailto:jokeout@gmail.com]
>  ...
> My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
> What is the best practice ?
> 
> Do you any idea if they are a bug ? or is it just my conf which is not
> correct ?
> 
> Thanks for your help

RE: OutOfMemory

Posted by Avinash Lakshman <al...@facebook.com>.

There is an element in the config for Java params. Set it to -Xms1024M
and give it a shot. It is definitely seems like a case of you running
out of heap space.

A
-----Original Message-----
From: Emmanuel JOKE [mailto:jokeout@gmail.com] 
Sent: Saturday, June 30, 2007 10:32 AM
To: hadoop-user
Subject: OutOfMemory

Hi,

I tried to update my db, using the following command:
 bin/nutch updatedb crawld/crawldb crawld/segments/20070628095836

and my 2 nodes had an error and i can see the following exception:
2007-06-30 12:24:29,688 INFO  mapred.TaskInProgress - Error from
task_0001_m_000000_1: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java
:94)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.Text.write(Text.java:243)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
MapTask.java:316)
        at
org.apache.nutch.crawl.CrawlDbFilter.map(CrawlDbFilter.java:99)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
:1445)


My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
What is the best practice ?

Do you any idea if they are a bug ? or is it just my conf which is not
correct ?

Thanks for your help