You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ken Krugler <kk...@transpac.com> on 2009/05/08 20:16:01 UTC

Setting thread stack size for child JVM

Hi there,

For a very specific type of reduce task, we currently need to use a 
large number of threads.

To avoid running out of memory, I'd like to constrain the Linux stack 
size via a "ulimit -s xxx" shell script command before starting up 
the JVM. I could do this for the entire system at boot time, but it 
would be better to have it for just the Hadoop JVM(s).

Any suggestions for how best to handle this?

Thanks,

-- Ken
-- 
Ken Krugler
+1 530-210-6378

Re: Setting thread stack size for child JVM

Posted by Philip Zeyliger <ph...@cloudera.com>.
On Fri, May 8, 2009 at 1:11 PM, Ken Krugler <kk...@transpac.com>wrote:

>  You an set the mapred.child.java.opts on a per job basis
>> either via -D mapred.child.java.ops="java options" or via
>> conf.set("mapred.child.java.opts", "java options").
>>
>> Note: the conf.set must be done before the job is submitted.
>>
>> On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger <philip@cloudera.com
>> >wrote:
>>
>>   You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
>>>  setting.  That's controlling the Java stack size, which I think is the
>>>  relevant bit for you.
>>>
>>
> That's part of it, but there's also native memory used when you start a
> thread with most JREs.


It doesn't look like Hadoop lets you run an arbitrary ulimit command, but if
you take a look at Shell.getUlimitMemoryCommand(conf) (called from
TaskRunner.java), you'll see that it lets you specify "ulimit -v N"
commands.  You could probably augment that for "ulimit -s" pretty easily.

Re: Setting thread stack size for child JVM

Posted by Ken Krugler <kk...@transpac.com>.
>You an set the mapred.child.java.opts on a per job basis
>either via -D mapred.child.java.ops="java options" or via
>conf.set("mapred.child.java.opts", "java options").
>
>Note: the conf.set must be done before the job is submitted.
>
>On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger <ph...@cloudera.com>wrote:
>
>>  You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
>>  setting.  That's controlling the Java stack size, which I think is the
>>  relevant bit for you.

That's part of it, but there's also native memory used when you start 
a thread with most JREs.

See the lengthy article at 
http://www.ibm.com/developerworks/java/library/j-nativememory-linux/index.html 
for more details than you probably ever wanted to know :) I haven't 
tried the sample code on my EC2 instances, but will try to do so next 
week and post results.

In the past, with FC4 & (I think) FC6, we definitely needed to 
constrain the OS stack size to avoid running out of native memory 
when spawning lots of Java threads.

-- Ken




>  > <property>
>>   <name>mapred.child.java.opts</name>
>>   <value>-Xmx200m</value>
>>   <description>Java opts for the task tracker child processes.
>>   The following symbol, if present, will be interpolated: @taskid@ is
>>  replaced
>>   by current TaskID. Any other occurrences of '@' will go unchanged.
>>   For example, to enable verbose gc logging to a file named for the taskid
>>  in
>>   /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
>>         -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
>>
>>   The configuration variable mapred.child.ulimit can be used to control the
>>   maximum virtual memory of the child processes.
>>   </description>
>>  </property>
>>
>>
>>  On Fri, May 8, 2009 at 11:16 AM, Ken Krugler <kkrugler_lists@transpac.com
>>  >wrote:
>>
>>  > Hi there,
>>  >
>>  > For a very specific type of reduce task, we currently need to use a large
>>  > number of threads.
>>  >
>>  > To avoid running out of memory, I'd like to constrain the Linux stack
>>  size
>>  > via a "ulimit -s xxx" shell script command before starting up the JVM. I
>>  > could do this for the entire system at boot time, but it would be better
>>  to
>>  > have it for just the Hadoop JVM(s).
>>  >
>>  > Any suggestions for how best to handle this?
>>  >
>>  > Thanks,
>>  >
>  > > -- Ken

-- 
Ken Krugler
+1 530-210-6378

Re: Setting thread stack size for child JVM

Posted by jason hadoop <ja...@gmail.com>.
You an set the mapred.child.java.opts on a per job basis
either via -D mapred.child.java.ops="java options" or via
conf.set("mapred.child.java.opts", "java options").

Note: the conf.set must be done before the job is submitted.

On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger <ph...@cloudera.com>wrote:

> You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
> setting.  That's controlling the Java stack size, which I think is the
> relevant bit for you.
>
> Cheers,
>
> -- Philip
>
>
> <property>
>  <name>mapred.child.java.opts</name>
>  <value>-Xmx200m</value>
>  <description>Java opts for the task tracker child processes.
>  The following symbol, if present, will be interpolated: @taskid@ is
> replaced
>  by current TaskID. Any other occurrences of '@' will go unchanged.
>  For example, to enable verbose gc logging to a file named for the taskid
> in
>  /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
>        -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
>
>  The configuration variable mapred.child.ulimit can be used to control the
>  maximum virtual memory of the child processes.
>  </description>
> </property>
>
>
> On Fri, May 8, 2009 at 11:16 AM, Ken Krugler <kkrugler_lists@transpac.com
> >wrote:
>
> > Hi there,
> >
> > For a very specific type of reduce task, we currently need to use a large
> > number of threads.
> >
> > To avoid running out of memory, I'd like to constrain the Linux stack
> size
> > via a "ulimit -s xxx" shell script command before starting up the JVM. I
> > could do this for the entire system at boot time, but it would be better
> to
> > have it for just the Hadoop JVM(s).
> >
> > Any suggestions for how best to handle this?
> >
> > Thanks,
> >
> > -- Ken
> > --
> > Ken Krugler
> > +1 530-210-6378
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Re: Setting thread stack size for child JVM

Posted by Philip Zeyliger <ph...@cloudera.com>.
You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
setting.  That's controlling the Java stack size, which I think is the
relevant bit for you.

Cheers,

-- Philip


<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx200m</value>
  <description>Java opts for the task tracker child processes.
  The following symbol, if present, will be interpolated: @taskid@ is
replaced
  by current TaskID. Any other occurrences of '@' will go unchanged.
  For example, to enable verbose gc logging to a file named for the taskid
in
  /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
        -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc

  The configuration variable mapred.child.ulimit can be used to control the
  maximum virtual memory of the child processes.
  </description>
</property>


On Fri, May 8, 2009 at 11:16 AM, Ken Krugler <kk...@transpac.com>wrote:

> Hi there,
>
> For a very specific type of reduce task, we currently need to use a large
> number of threads.
>
> To avoid running out of memory, I'd like to constrain the Linux stack size
> via a "ulimit -s xxx" shell script command before starting up the JVM. I
> could do this for the entire system at boot time, but it would be better to
> have it for just the Hadoop JVM(s).
>
> Any suggestions for how best to handle this?
>
> Thanks,
>
> -- Ken
> --
> Ken Krugler
> +1 530-210-6378
>