You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Richard J. Zak (JIRA)" <ji...@apache.org> on 2009/01/02 18:31:44 UTC

[jira] Created: (HADOOP-4976) Mapper runs out of memory

Mapper runs out of memory
-------------------------

                 Key: HADOOP-4976
                 URL: https://issues.apache.org/jira/browse/HADOOP-4976
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.19.0
         Environment: Amazon EC2 Extra Large instance (4 cores, 15 GB RAM), Sun Java 6 (1.6.0_10); 1 Master, 4 Slaves (all the same); each Java process takes the argument "-Xmx700m" (2 Java processes per Instance)
            Reporter: Richard J. Zak
             Fix For: 0.19.1


The hadoop job has the task of processing 4 directories in HDFS, each with 15 files.  This is sample data, a test run, before I go to the needed 5 directories of about 800 documents each.  The mapper takes in nearly 200 pages (not files) and throws an OutOfMemory exception.  The largest file is 17 MB.

If this problem is something on my end and not truly a bug, I apologize.  However, after Googling a bit, I did see many threads of people running out of memory with small data sets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (HADOOP-4976) Mapper runs out of memory

Posted by jason hadoop <ja...@gmail.com>.

I think we should add the IO SORT SIZE to the default JVM child size and we
will have a lot fewer OOM's. That will take out the first pass of OOM mapper
errors and we can start to work on the more real ones.

On Mon, Jan 26, 2009 at 12:11 PM, Thibaut (JIRA) <ji...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/HADOOP-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Thibaut updated HADOOP-4976:
> ----------------------------
>
>
> Try increasing the child heap size in the hadoop-site.xml configuration
> file.
>
> > Mapper runs out of memory
> > -------------------------
> >
> >                 Key: HADOOP-4976
> >                 URL: https://issues.apache.org/jira/browse/HADOOP-4976
> >             Project: Hadoop Core
> >          Issue Type: Bug
> >          Components: mapred
> >    Affects Versions: 0.19.0
> >         Environment: Amazon EC2 Extra Large instance (4 cores, 15 GB
> RAM), Sun Java 6 (1.6.0_10); 1 Master, 4 Slaves (all the same); each Java
> process takes the argument "-Xmx700m" (2 Java processes per Instance)
> >            Reporter: Richard J. Zak
> >             Fix For: 0.19.1
> >
> >
> > The hadoop job has the task of processing 4 directories in HDFS, each
> with 15 files.  This is sample data, a test run, before I go to the needed 5
> directories of about 800 documents each.  The mapper takes in nearly 200
> pages (not files) and throws an OutOfMemory exception.  The largest file is
> 17 MB.
> > If this problem is something on my end and not truly a bug, I apologize.
>  However, after Googling a bit, I did see many threads of people running out
> of memory with small data sets.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Commented: (HADOOP-4976) Mapper runs out of memory

Posted by "Tim Hawkins (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680004#action_12680004 ] 

Tim Hawkins commented on HADOOP-4976:
-------------------------------------

I have several problems wit running 0.19.0 on EC2 

Look very carefully at you out of memory error, it might not actually be out of memory, we run on large EC21 instances, 5 mapreds per node with the following JVM config. 

-Xms2048m -Xmx2048m // preload the VM at 2G

-Xloggc:/mnt/logs/@taskid@.gc // enable VM logging

-XX:+UseConcMarkSweepGC // use concurrent garbage collection

-XX:-UseGCOverheadLimit // disable GC stall protection, otherwise processes with large memory churn tend to get aborted

The last option turns off a protection added in java 6, which will produce an out of memory exception if the GC takes too long to run, even if there is plenty of memory left, turning it off seems to have increased stability dramatically

We tend to overcommit on the JVM heaps because our usage pattern means that only a few very large tasks get run amongst a stream of smaller tasks. 

> Mapper runs out of memory
> -------------------------
>
>                 Key: HADOOP-4976
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4976
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>         Environment: Amazon EC2 Extra Large instance (4 cores, 15 GB RAM), Sun Java 6 (1.6.0_10); 1 Master, 4 Slaves (all the same); each Java process takes the argument "-Xmx700m" (2 Java processes per Instance)
>            Reporter: Richard J. Zak
>             Fix For: 0.19.2, 0.20.0
>
>
> The hadoop job has the task of processing 4 directories in HDFS, each with 15 files.  This is sample data, a test run, before I go to the needed 5 directories of about 800 documents each.  The mapper takes in nearly 200 pages (not files) and throws an OutOfMemory exception.  The largest file is 17 MB.
> If this problem is something on my end and not truly a bug, I apologize.  However, after Googling a bit, I did see many threads of people running out of memory with small data sets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4976) Mapper runs out of memory

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thibaut updated HADOOP-4976:
----------------------------


Try increasing the child heap size in the hadoop-site.xml configuration file.

> Mapper runs out of memory
> -------------------------
>
>                 Key: HADOOP-4976
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4976
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>         Environment: Amazon EC2 Extra Large instance (4 cores, 15 GB RAM), Sun Java 6 (1.6.0_10); 1 Master, 4 Slaves (all the same); each Java process takes the argument "-Xmx700m" (2 Java processes per Instance)
>            Reporter: Richard J. Zak
>             Fix For: 0.19.1
>
>
> The hadoop job has the task of processing 4 directories in HDFS, each with 15 files.  This is sample data, a test run, before I go to the needed 5 directories of about 800 documents each.  The mapper takes in nearly 200 pages (not files) and throws an OutOfMemory exception.  The largest file is 17 MB.
> If this problem is something on my end and not truly a bug, I apologize.  However, after Googling a bit, I did see many threads of people running out of memory with small data sets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.