You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Jyothish Soman <jy...@gmail.com> on 2010/11/09 10:25:26 UTC

How to increase data processed by one JVM instance

Hello,
I wanted to know how to increase the data processed by a single JVM
instance. What options are needed for this, and where to put them up.

Regards,
Jyothish Soman

Re: How to increase data processed by one JVM instance

Posted by Harsh J <qw...@gmail.com>.

Hi,

On Tue, Nov 9, 2010 at 2:55 PM, Jyothish Soman <jy...@gmail.com> wrote:
> Hello,
> I wanted to know how to increase the data processed by a single JVM
> instance. What options are needed for this, and where to put them up.

What do you exactly mean by increasing the "data processed" part?

In case you're running into out-of-memory issues, look at the
"mapred.child.java.opts" property to increase the Heap Size allocated
to each Task JVM under a TaskTracker.

If you're looking to increase the minimum split size of each mapper to
act upon (which defaults to the block size if am right), the property
"mapred.min.split.size", set in bytes, can help you with that
(although certain InputFormats may override this). You can also copy
the data on the HDFS around with a new block size set.

HTH

-- 
Harsh J
www.harshj.com