You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ken Krugler <kk...@transpac.com> on 2009/05/08 20:16:01 UTC
Setting thread stack size for child JVM
Hi there,
For a very specific type of reduce task, we currently need to use a
large number of threads.
To avoid running out of memory, I'd like to constrain the Linux stack
size via a "ulimit -s xxx" shell script command before starting up
the JVM. I could do this for the entire system at boot time, but it
would be better to have it for just the Hadoop JVM(s).
Any suggestions for how best to handle this?
Thanks,
-- Ken
--
Ken Krugler
+1 530-210-6378
Re: Setting thread stack size for child JVM
Posted by Philip Zeyliger <ph...@cloudera.com>.
On Fri, May 8, 2009 at 1:11 PM, Ken Krugler <kk...@transpac.com>wrote:
> You an set the mapred.child.java.opts on a per job basis
>> either via -D mapred.child.java.ops="java options" or via
>> conf.set("mapred.child.java.opts", "java options").
>>
>> Note: the conf.set must be done before the job is submitted.
>>
>> On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger <philip@cloudera.com
>> >wrote:
>>
>> You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
>>> setting. That's controlling the Java stack size, which I think is the
>>> relevant bit for you.
>>>
>>
> That's part of it, but there's also native memory used when you start a
> thread with most JREs.
It doesn't look like Hadoop lets you run an arbitrary ulimit command, but if
you take a look at Shell.getUlimitMemoryCommand(conf) (called from
TaskRunner.java), you'll see that it lets you specify "ulimit -v N"
commands. You could probably augment that for "ulimit -s" pretty easily.
Re: Setting thread stack size for child JVM
Posted by Ken Krugler <kk...@transpac.com>.
>You an set the mapred.child.java.opts on a per job basis
>either via -D mapred.child.java.ops="java options" or via
>conf.set("mapred.child.java.opts", "java options").
>
>Note: the conf.set must be done before the job is submitted.
>
>On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger <ph...@cloudera.com>wrote:
>
>> You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
>> setting. That's controlling the Java stack size, which I think is the
>> relevant bit for you.
That's part of it, but there's also native memory used when you start
a thread with most JREs.
See the lengthy article at
http://www.ibm.com/developerworks/java/library/j-nativememory-linux/index.html
for more details than you probably ever wanted to know :) I haven't
tried the sample code on my EC2 instances, but will try to do so next
week and post results.
In the past, with FC4 & (I think) FC6, we definitely needed to
constrain the OS stack size to avoid running out of native memory
when spawning lots of Java threads.
-- Ken
> > <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx200m</value>
>> <description>Java opts for the task tracker child processes.
>> The following symbol, if present, will be interpolated: @taskid@ is
>> replaced
>> by current TaskID. Any other occurrences of '@' will go unchanged.
>> For example, to enable verbose gc logging to a file named for the taskid
>> in
>> /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
>> -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
>>
>> The configuration variable mapred.child.ulimit can be used to control the
>> maximum virtual memory of the child processes.
>> </description>
>> </property>
>>
>>
>> On Fri, May 8, 2009 at 11:16 AM, Ken Krugler <kkrugler_lists@transpac.com
>> >wrote:
>>
>> > Hi there,
>> >
>> > For a very specific type of reduce task, we currently need to use a large
>> > number of threads.
>> >
>> > To avoid running out of memory, I'd like to constrain the Linux stack
>> size
>> > via a "ulimit -s xxx" shell script command before starting up the JVM. I
>> > could do this for the entire system at boot time, but it would be better
>> to
>> > have it for just the Hadoop JVM(s).
>> >
>> > Any suggestions for how best to handle this?
>> >
>> > Thanks,
>> >
> > > -- Ken
--
Ken Krugler
+1 530-210-6378
Re: Setting thread stack size for child JVM
Posted by jason hadoop <ja...@gmail.com>.
You an set the mapred.child.java.opts on a per job basis
either via -D mapred.child.java.ops="java options" or via
conf.set("mapred.child.java.opts", "java options").
Note: the conf.set must be done before the job is submitted.
On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger <ph...@cloudera.com>wrote:
> You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
> setting. That's controlling the Java stack size, which I think is the
> relevant bit for you.
>
> Cheers,
>
> -- Philip
>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx200m</value>
> <description>Java opts for the task tracker child processes.
> The following symbol, if present, will be interpolated: @taskid@ is
> replaced
> by current TaskID. Any other occurrences of '@' will go unchanged.
> For example, to enable verbose gc logging to a file named for the taskid
> in
> /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
> -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
>
> The configuration variable mapred.child.ulimit can be used to control the
> maximum virtual memory of the child processes.
> </description>
> </property>
>
>
> On Fri, May 8, 2009 at 11:16 AM, Ken Krugler <kkrugler_lists@transpac.com
> >wrote:
>
> > Hi there,
> >
> > For a very specific type of reduce task, we currently need to use a large
> > number of threads.
> >
> > To avoid running out of memory, I'd like to constrain the Linux stack
> size
> > via a "ulimit -s xxx" shell script command before starting up the JVM. I
> > could do this for the entire system at boot time, but it would be better
> to
> > have it for just the Hadoop JVM(s).
> >
> > Any suggestions for how best to handle this?
> >
> > Thanks,
> >
> > -- Ken
> > --
> > Ken Krugler
> > +1 530-210-6378
> >
>
--
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals
Re: Setting thread stack size for child JVM
Posted by Philip Zeyliger <ph...@cloudera.com>.
You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
setting. That's controlling the Java stack size, which I think is the
relevant bit for you.
Cheers,
-- Philip
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx200m</value>
<description>Java opts for the task tracker child processes.
The following symbol, if present, will be interpolated: @taskid@ is
replaced
by current TaskID. Any other occurrences of '@' will go unchanged.
For example, to enable verbose gc logging to a file named for the taskid
in
/tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
-Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
The configuration variable mapred.child.ulimit can be used to control the
maximum virtual memory of the child processes.
</description>
</property>
On Fri, May 8, 2009 at 11:16 AM, Ken Krugler <kk...@transpac.com>wrote:
> Hi there,
>
> For a very specific type of reduce task, we currently need to use a large
> number of threads.
>
> To avoid running out of memory, I'd like to constrain the Linux stack size
> via a "ulimit -s xxx" shell script command before starting up the JVM. I
> could do this for the entire system at boot time, but it would be better to
> have it for just the Hadoop JVM(s).
>
> Any suggestions for how best to handle this?
>
> Thanks,
>
> -- Ken
> --
> Ken Krugler
> +1 530-210-6378
>