You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by KoRnE <ko...@hotmail.com> on 2010/11/01 21:21:36 UTC

hadoop map-reduce windows linux heap space

Hi everyone
Here is my problem:
First of all I'm working with single node configuration
I'm developing an application and I made just one map function, in this map
function I call like 10 functions,
the application reads from a csv file and process a certain column, I
already made the jar file and everything so when I run this app with a csv
with 4000 rows on windows (windows 7) (using cygwin) on a 4 GB RAM machine,
the application works fine, but when I run it on linux- ubuntu on a 2 GB RAM
machine, it process some rows but then it throws a "Java heap space" error,
or sometimes the thread is killed.

For the linux:
I already tried to change the hadoop export HEAP_SIZE and also the Xmx and
Xms parameters on the app and it made some difference but not too much, the
error still happening...

Do you know why it s happening?  its because the 4GB and 2GB of RAM
difference between machines?

Thanks
-- 
View this message in context: http://old.nabble.com/hadoop-map-reduce-windows--linux-heap-space-tp30108246p30108246.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: hadoop map-reduce windows linux heap space

Posted by Harsh J <qw...@gmail.com>.

On Tue, Nov 2, 2010 at 1:51 AM, KoRnE <ko...@hotmail.com> wrote:
>
> Hi everyone
> Here is my problem:
> First of all I'm working with single node configuration
> I'm developing an application and I made just one map function, in this map
> function I call like 10 functions,
> the application reads from a csv file and process a certain column, I
> already made the jar file and everything so when I run this app with a csv
> with 4000 rows on windows (windows 7) (using cygwin) on a 4 GB RAM machine,
> the application works fine, but when I run it on linux- ubuntu on a 2 GB RAM
> machine, it process some rows but then it throws a "Java heap space" error,
> or sometimes the thread is killed.
>
> For the linux:
> I already tried to change the hadoop export HEAP_SIZE and also the Xmx and
> Xms parameters on the app and it made some difference but not too much, the
> error still happening...

Assuming you have already tuned mapred.child.java.opts (which defaults
to 200 MB per mapper) inside mapred-site.xml; you need to check your
mapper's memory utilization. Things like reducing the amount of values
you cache inside the mapper (or removing unnecessary ones in time)
could help.

Another solution would be to reduce the input split sizes for the CSV
files to each mapper (Reduce its block size, or fiddle with
mapred.min.split.size). This way your mapper should 'naturally'
consume less memory if you're caching values in its run.

>
> Do you know why it s happening?  its because the 4GB and 2GB of RAM
> difference between machines?

A mapper would consume only "mapred.child.java.opts" worth of memory,
so it is not the difference of RAM in the machines definitely. I run
CDH for development purposes on an ArchLinux desktop with 2 GB RAM,
stuff runs smooth on it.

>
> Thanks
> --
> View this message in context: http://old.nabble.com/hadoop-map-reduce-windows--linux-heap-space-tp30108246p30108246.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

-- 
Harsh J
www.harshj.com