You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tim Hawkins (JIRA)" <ji...@apache.org> on 2009/03/08 20:27:56 UTC
[jira] Commented: (HADOOP-4976) Mapper runs out of memory

    [ https://issues.apache.org/jira/browse/HADOOP-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680004#action_12680004 ] 

Tim Hawkins commented on HADOOP-4976:
-------------------------------------

I have several problems wit running 0.19.0 on EC2 

Look very carefully at you out of memory error, it might not actually be out of memory, we run on large EC21 instances, 5 mapreds per node with the following JVM config. 

-Xms2048m -Xmx2048m // preload the VM at 2G

-Xloggc:/mnt/logs/@taskid@.gc // enable VM logging

-XX:+UseConcMarkSweepGC // use concurrent garbage collection

-XX:-UseGCOverheadLimit // disable GC stall protection, otherwise processes with large memory churn tend to get aborted

The last option turns off a protection added in java 6, which will produce an out of memory exception if the GC takes too long to run, even if there is plenty of memory left, turning it off seems to have increased stability dramatically

We tend to overcommit on the JVM heaps because our usage pattern means that only a few very large tasks get run amongst a stream of smaller tasks. 

> Mapper runs out of memory
> -------------------------
>
>                 Key: HADOOP-4976
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4976
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>         Environment: Amazon EC2 Extra Large instance (4 cores, 15 GB RAM), Sun Java 6 (1.6.0_10); 1 Master, 4 Slaves (all the same); each Java process takes the argument "-Xmx700m" (2 Java processes per Instance)
>            Reporter: Richard J. Zak
>             Fix For: 0.19.2, 0.20.0
>
>
> The hadoop job has the task of processing 4 directories in HDFS, each with 15 files.  This is sample data, a test run, before I go to the needed 5 directories of about 800 documents each.  The mapper takes in nearly 200 pages (not files) and throws an OutOfMemory exception.  The largest file is 17 MB.
> If this problem is something on my end and not truly a bug, I apologize.  However, after Googling a bit, I did see many threads of people running out of memory with small data sets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.