You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by James Hammerton <ja...@mendeley.com> on 2011/04/26 19:55:06 UTC

Mappers crashing due to running out of heap space during initialisation

Hi,

I have a job that runs fine with a small data set in pseudo-distributed mode
on my desktop workstation but when I run it on our Hadoop cluster it falls
over, crashing during the initialisation of some of the mappers. The errors
look like this:

2011-04-26 14:34:04,494 FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:743)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:487)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)


The mapper itself buffers only the previous input and the objects are small
(max 201K in size, most under 50k), so I don't know why this is happening.

What exactly is happening in the area of code referred to in the stack
trace?

Cheers,

James

-- 
James Hammerton | Senior Data Mining Engineer
www.mendeley.com/profiles/james-hammerton

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Re: Mappers crashing due to running out of heap space during initialisation

Posted by James Hammerton <ja...@mendeley.com>.

Thanks Joey.

I guess the puzzle then is why some of my mappers used up over 312mb,
leaving insufficient room out of the 512mb total we allocate when the job is
no more complex than other jobs that run happily in that space. The memory
usage is independent of the size of my data set, and even the largest items
are only 201k in size and it only ever has a few of them in memory at once.

James

On Wed, Apr 27, 2011 at 12:47 PM, Joey Echeverria <jo...@cloudera.com> wrote:

> It was initializing a 200MB buffer to do the sorting of the output in.
> How much space did you allocate the task JVMs (mapred.child.java.opts
> in mapred-site.xml)?
>
> If you didn't change the default, it's set to 200MB which is why you
> would run out of error trying to allocate a 200MB buffer.
>
> -Joey
>
> On Wed, Apr 27, 2011 at 6:02 AM, James Hammerton
> <ja...@mendeley.com> wrote:
> > Hi,
> >
> > I lowered the io.sort.mb to 100mb from 200mb and that allowed my job to
> get
> > through the mapping phase, thanks Chris.
> >
> > However what I don't understand is why the memory got used up in the
> first
> > place when the mapper only buffers the previous input and the maximum
> > serialised size of the objects it's dealing with is 201k.
> >
> > This is why I asked about what Hadoop is doing in the area of code where
> the
> > exception was occurring - as far as I can tell, my mapper code wasn't
> even
> > getting run.
> >
> > Regards,
> >
> > James
> >
> > On Tue, Apr 26, 2011 at 8:02 PM, Chris Douglas <cd...@apache.org>
> wrote:
> >>
> >> Lower io.sort.mb or raise the heap size for the task. -C
> >>
> >> On Tue, Apr 26, 2011 at 10:55 AM, James Hammerton
> >> <ja...@mendeley.com> wrote:
> >> > Hi,
> >> >
> >> > I have a job that runs fine with a small data set in
> pseudo-distributed
> >> > mode
> >> > on my desktop workstation but when I run it on our Hadoop cluster it
> >> > falls
> >> > over, crashing during the initialisation of some of the mappers. The
> >> > errors
> >> > look like this:
> >> >
> >> > 2011-04-26 14:34:04,494 FATAL org.apache.hadoop.mapred.TaskTracker:
> >> > Error
> >> > running child : java.lang.OutOfMemoryError: Java heap space
> >> >       at
> >> >
> >> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:743)
> >> >
> >> >       at
> >> >
> >> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:487)
> >> >       at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
> >> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >> >
> >> >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >> >
> >> > The mapper itself buffers only the previous input and the objects are
> >> > small
> >> > (max 201K in size, most under 50k), so I don't know why this is
> >> > happening.
> >> >
> >> > What exactly is happening in the area of code referred to in the stack
> >> > trace?
> >> >
> >> > Cheers,
> >> >
> >> > James
> >> >
> >> > --
> >> > James Hammerton | Senior Data Mining Engineer
> >> > www.mendeley.com/profiles/james-hammerton
> >> >
> >> > Mendeley Limited | London, UK | www.mendeley.com
> >> > Registered in England and Wales | Company Number 6419015
> >> >
> >> >
> >> >
> >> >
> >
> >
> >
> > --
> > James Hammerton | Senior Data Mining Engineer
> > www.mendeley.com/profiles/james-hammerton
> >
> > Mendeley Limited | London, UK | www.mendeley.com
> > Registered in England and Wales | Company Number 6419015
> >
> >
> >
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>



-- 
James Hammerton | Senior Data Mining Engineer
www.mendeley.com/profiles/james-hammerton

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Re: Mappers crashing due to running out of heap space during initialisation

Posted by Joey Echeverria <jo...@cloudera.com>.

It was initializing a 200MB buffer to do the sorting of the output in.
How much space did you allocate the task JVMs (mapred.child.java.opts
in mapred-site.xml)?

If you didn't change the default, it's set to 200MB which is why you
would run out of error trying to allocate a 200MB buffer.

-Joey

On Wed, Apr 27, 2011 at 6:02 AM, James Hammerton
<ja...@mendeley.com> wrote:
> Hi,
>
> I lowered the io.sort.mb to 100mb from 200mb and that allowed my job to get
> through the mapping phase, thanks Chris.
>
> However what I don't understand is why the memory got used up in the first
> place when the mapper only buffers the previous input and the maximum
> serialised size of the objects it's dealing with is 201k.
>
> This is why I asked about what Hadoop is doing in the area of code where the
> exception was occurring - as far as I can tell, my mapper code wasn't even
> getting run.
>
> Regards,
>
> James
>
> On Tue, Apr 26, 2011 at 8:02 PM, Chris Douglas <cd...@apache.org> wrote:
>>
>> Lower io.sort.mb or raise the heap size for the task. -C
>>
>> On Tue, Apr 26, 2011 at 10:55 AM, James Hammerton
>> <ja...@mendeley.com> wrote:
>> > Hi,
>> >
>> > I have a job that runs fine with a small data set in pseudo-distributed
>> > mode
>> > on my desktop workstation but when I run it on our Hadoop cluster it
>> > falls
>> > over, crashing during the initialisation of some of the mappers. The
>> > errors
>> > look like this:
>> >
>> > 2011-04-26 14:34:04,494 FATAL org.apache.hadoop.mapred.TaskTracker:
>> > Error
>> > running child : java.lang.OutOfMemoryError: Java heap space
>> >       at
>> >
>> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:743)
>> >
>> >       at
>> >
>> > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:487)
>> >       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
>> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> >
>> >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >
>> > The mapper itself buffers only the previous input and the objects are
>> > small
>> > (max 201K in size, most under 50k), so I don't know why this is
>> > happening.
>> >
>> > What exactly is happening in the area of code referred to in the stack
>> > trace?
>> >
>> > Cheers,
>> >
>> > James
>> >
>> > --
>> > James Hammerton | Senior Data Mining Engineer
>> > www.mendeley.com/profiles/james-hammerton
>> >
>> > Mendeley Limited | London, UK | www.mendeley.com
>> > Registered in England and Wales | Company Number 6419015
>> >
>> >
>> >
>> >
>
>
>
> --
> James Hammerton | Senior Data Mining Engineer
> www.mendeley.com/profiles/james-hammerton
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>
>
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Mappers crashing due to running out of heap space during initialisation

Posted by James Hammerton <ja...@mendeley.com>.

Hi,

I lowered the io.sort.mb to 100mb from 200mb and that allowed my job to get
through the mapping phase, thanks Chris.

However what I don't understand is why the memory got used up in the first
place when the mapper only buffers the previous input and the maximum
serialised size of the objects it's dealing with is 201k.

This is why I asked about what Hadoop is doing in the area of code where the
exception was occurring - as far as I can tell, my mapper code wasn't even
getting run.

Regards,

James

On Tue, Apr 26, 2011 at 8:02 PM, Chris Douglas <cd...@apache.org> wrote:

> Lower io.sort.mb or raise the heap size for the task. -C
>
> On Tue, Apr 26, 2011 at 10:55 AM, James Hammerton
> <ja...@mendeley.com> wrote:
> > Hi,
> >
> > I have a job that runs fine with a small data set in pseudo-distributed
> mode
> > on my desktop workstation but when I run it on our Hadoop cluster it
> falls
> > over, crashing during the initialisation of some of the mappers. The
> errors
> > look like this:
> >
> > 2011-04-26 14:34:04,494 FATAL org.apache.hadoop.mapred.TaskTracker: Error
> > running child : java.lang.OutOfMemoryError: Java heap space
> >       at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:743)
> >
> >       at
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:487)
> >       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >
> >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > The mapper itself buffers only the previous input and the objects are
> small
> > (max 201K in size, most under 50k), so I don't know why this is
> happening.
> >
> > What exactly is happening in the area of code referred to in the stack
> > trace?
> >
> > Cheers,
> >
> > James
> >
> > --
> > James Hammerton | Senior Data Mining Engineer
> > www.mendeley.com/profiles/james-hammerton
> >
> > Mendeley Limited | London, UK | www.mendeley.com
> > Registered in England and Wales | Company Number 6419015
> >
> >
> >
> >
>



-- 
James Hammerton | Senior Data Mining Engineer
www.mendeley.com/profiles/james-hammerton

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Re: Mappers crashing due to running out of heap space during initialisation

Posted by Chris Douglas <cd...@apache.org>.

Lower io.sort.mb or raise the heap size for the task. -C

On Tue, Apr 26, 2011 at 10:55 AM, James Hammerton
<ja...@mendeley.com> wrote:
> Hi,
>
> I have a job that runs fine with a small data set in pseudo-distributed mode
> on my desktop workstation but when I run it on our Hadoop cluster it falls
> over, crashing during the initialisation of some of the mappers. The errors
> look like this:
>
> 2011-04-26 14:34:04,494 FATAL org.apache.hadoop.mapred.TaskTracker: Error
> running child : java.lang.OutOfMemoryError: Java heap space
> 	at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:743)
>
> 	at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:487)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> The mapper itself buffers only the previous input and the objects are small
> (max 201K in size, most under 50k), so I don't know why this is happening.
>
> What exactly is happening in the area of code referred to in the stack
> trace?
>
> Cheers,
>
> James
>
> --
> James Hammerton | Senior Data Mining Engineer
> www.mendeley.com/profiles/james-hammerton
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>
>
>
>