You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Taeho Kang <tk...@gmail.com> on 2008/06/16 03:53:04 UTC

Question on HadoopStreaming and Memory Usage

Dear All,

I've got a question about hadoop streaming with its memory management.
Does hadoop streaming have a mechanism to prevent over-usage of memory by
its subprocesses (Map or Reduce function)?

Say, a binary used for reduce phase allocates itself lots and lots of memory
to the point it starves other important processes like a Datanode or
TaskTracker process. Does Hadoop Streaming prevent such cases?

Thank you in advance,

Taeho

RE: Question on HadoopStreaming and Memory Usage

Posted by Devaraj Das <dd...@yahoo-inc.com>.

Hadoop does provide a ulimit based way to control the memory consumption by
the tasks it spawns via the config mapred.child.ulimit. Look at
http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#Task+Executi
on+%26+Environment 
However, what is lacking is a way to get the cumulative memory consumption
of all processes spawned by a map/reduce task. For e.g., a streaming process
could spawn 100s of processes and they collectively can cause havoc.

> -----Original Message-----
> From: Taeho Kang [mailto:tkang1@gmail.com] 
> Sent: Monday, June 16, 2008 7:23 AM
> To: hadoop-user@lucene.apache.org
> Subject: Question on HadoopStreaming and Memory Usage
> 
> Dear All,
> 
> I've got a question about hadoop streaming with its memory management.
> Does hadoop streaming have a mechanism to prevent over-usage 
> of memory by its subprocesses (Map or Reduce function)?
> 
> Say, a binary used for reduce phase allocates itself lots and 
> lots of memory to the point it starves other important 
> processes like a Datanode or TaskTracker process. Does Hadoop 
> Streaming prevent such cases?
> 
> Thank you in advance,
> 
> Taeho
>