You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2010/01/21 22:35:18 UTC

JVM reuse (was: HBase bulk load)

This is a question about straight mapreduce so I'm cross-sending the answer.

To get any parallelization, you have to start multiple JVMs in the
current hadoop version. Let's say you have configured your servers
with mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum to 10 then it will start 20
JVMs when you launch a job. If there's no reuse, new JVMs are started
for new map/reduce. If you do reuse, it won't start new JVMs (depends
on your exact configs VS the job).

J-D

On Thu, Jan 21, 2010 at 1:16 PM, Sriram Muthuswamy Chittathoor
<sr...@ivycomptech.com> wrote:
> I noticed one thing during my sample mapreduce job running -- it creates a lot of java processes on the slave nodes.  Even when I have "reuse.tasks" property set why does it not use a single jvm.  Sometime I see almost like 20 jvms running in a single box.  What property can I use to reduce it from spawning these huge number of jvm's
>