You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Kris Jirapinyo <kj...@biz360.com> on 2009/02/12 03:56:25 UTC

Reducer Out of Memory

Hi all,
    I am running a data-intensive job on 18 nodes on EC2, each with just
1.7GB of memory.  The input size is 50GB, and as a result, my mapper splits
it up automatically to 786 map tasks.  This runs fine.  However, I am
setting the reduce task number to 18.  This is where I get a java heap out
of memory error:

java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3209)
	at java.lang.String.(String.java:216)
	at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
	at java.nio.CharBuffer.toString(CharBuffer.java:1157)
	at org.apache.hadoop.io.Text.decode(Text.java:350)
	at org.apache.hadoop.io.Text.decode(Text.java:327)
	at org.apache.hadoop.io.Text.toString(Text.java:254)

	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
	at org.apache.hadoop.mapred.Child.main(Child.java:155)

Re: Reducer Out of Memory

Posted by Kris Jirapinyo <kj...@biz360.com>.
Darn that send button.

Anyways, so I was wondering if my understanding is correct.  There will only
be the exact same number of output files as the number of reducer tasks I
set.  Thus, in my output directory from the reducer, I should always see
only 18 files.  However, if my understanding is correct, then when I call
the output.collect() in my reducer, does it only get "flushed" at the end
when that particular reducer task finishes?  If that is the case, then it
does seem like as my input grow, 18 reducers will not be able to handle the
sheer volume of my data, as the collector will keep having to add more and
more data to it.

Thus, I guess this is the question.  Do I have to keep increasing the number
of reduce tasks so that the reducer can take smaller bites out of the
chunk?  Thus, if I'm running out of java heap space and I don't want to add
more nodes, then I need to set my reducer task number to say 36, etc.?  It
just seems like I'm missing something.

Of course, I could always add more nodes or upgrade to a higher instance so
I get more memory, but that's the obvious solution (I just hope it's not the
only solution).  I guess what I'm saying is that I thought the reducer would
be kind of smart enough to know that it's taking too big of a bite out of
the whole chunk (like the mapper) and readjust itself, as I don't really
care how many output files I get in the end, just that the result from the
reducer stays under one directory.


On Wed, Feb 11, 2009 at 6:56 PM, Kris Jirapinyo <kj...@biz360.com>wrote:

> Hi all,
>     I am running a data-intensive job on 18 nodes on EC2, each with just
> 1.7GB of memory.  The input size is 50GB, and as a result, my mapper splits
> it up automatically to 786 map tasks.  This runs fine.  However, I am
> setting the reduce task number to 18.  This is where I get a java heap out
> of memory error:
>
> java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOfRange(Arrays.java:3209)
> 	at java.lang.String.(String.java:216)
> 	at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
> 	at java.nio.CharBuffer.toString(CharBuffer.java:1157)
>
> 	at org.apache.hadoop.io.Text.decode(Text.java:350)
> 	at org.apache.hadoop.io.Text.decode(Text.java:327)
> 	at org.apache.hadoop.io.Text.toString(Text.java:254)
>
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
>
> 	at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
>
>

Re: Reducer Out of Memory

Posted by Kris Jirapinyo <kr...@biz360.com>.
I tried that, but with 1.7GB, that will not allow me to run 1 mapper and 1
reducer concurrently (as I think when you do -Xmx1024m it tries to reserve
that physical memory?).  Thus, to be safe, I set it to -Xmx768m.

The error I get when I do 1024m is this:

java.io.IOException: Cannot run program "bash": java.io.IOException:
error=12, Cannot allocate memory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
	at org.apache.hadoop.util.Shell.run(Shell.java:134)
	at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
	at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:321)
	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
	at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:160)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2079)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:457)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
	at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
allocate memory
	at java.lang.UNIXProcess.(UNIXProcess.java:148)
	at java.lang.ProcessImpl.start(ProcessImpl.java:65)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
	... 10 more




On Wed, Feb 11, 2009 at 7:02 PM, Rocks Lei Wang <be...@gmail.com> wrote:

> Maybe you need allocate larger vm- memory to use parameter -Xmx1024m
>
> On Thu, Feb 12, 2009 at 10:56 AM, Kris Jirapinyo <kjirapinyo@biz360.com
> >wrote:
>
> > Hi all,
> >    I am running a data-intensive job on 18 nodes on EC2, each with just
> > 1.7GB of memory.  The input size is 50GB, and as a result, my mapper
> splits
> > it up automatically to 786 map tasks.  This runs fine.  However, I am
> > setting the reduce task number to 18.  This is where I get a java heap
> out
> > of memory error:
> >
> > java.lang.OutOfMemoryError: Java heap space
> >        at java.util.Arrays.copyOfRange(Arrays.java:3209)
> >        at java.lang.String.(String.java:216)
> >        at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
> >        at java.nio.CharBuffer.toString(CharBuffer.java:1157)
> >        at org.apache.hadoop.io.Text.decode(Text.java:350)
> >        at org.apache.hadoop.io.Text.decode(Text.java:327)
> >        at org.apache.hadoop.io.Text.toString(Text.java:254)
> >
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:155)
> >
>

Re: Reducer Out of Memory

Posted by Rocks Lei Wang <be...@gmail.com>.
Maybe you need allocate larger vm- memory to use parameter -Xmx1024m

On Thu, Feb 12, 2009 at 10:56 AM, Kris Jirapinyo <kj...@biz360.com>wrote:

> Hi all,
>    I am running a data-intensive job on 18 nodes on EC2, each with just
> 1.7GB of memory.  The input size is 50GB, and as a result, my mapper splits
> it up automatically to 786 map tasks.  This runs fine.  However, I am
> setting the reduce task number to 18.  This is where I get a java heap out
> of memory error:
>
> java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOfRange(Arrays.java:3209)
>        at java.lang.String.(String.java:216)
>        at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
>        at java.nio.CharBuffer.toString(CharBuffer.java:1157)
>        at org.apache.hadoop.io.Text.decode(Text.java:350)
>        at org.apache.hadoop.io.Text.decode(Text.java:327)
>        at org.apache.hadoop.io.Text.toString(Text.java:254)
>
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
>        at org.apache.hadoop.mapred.Child.main(Child.java:155)
>