You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by john smith <js...@gmail.com> on 2011/09/19 14:43:14 UTC

Out of heap space errors on TTs

Hey guys,

I am running hive and I am trying to join two tables (2.2GB and 136MB) on a
cluster of 9 nodes (replication = 3)

Hadoop version - 0.20.2
Each data node memory - 2GB
HADOOP_HEAPSIZE - 1000MB

other heap settings are defaults. My hive launches 40 Maptasks and every
task failed with the same error

2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 300
2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)


Looks like I need to tweak some of the heap settings for TTs to handle
the memory efficiently. I am unable to understand which variables to
modify (there are too many related to heap sizes).

Any specific things I must look at?

Thanks,

jS

Re: Out of heap space errors on TTs

Posted by Mapred Learn <ma...@gmail.com>.
What is mapred.child.java.opts set to in your server (TAsk trackers ) configuration ? 
You need to set this to a bigger value like 1 gig or so..

Sent from my iPhone

On Sep 19, 2011, at 5:43 AM, john smith <js...@gmail.com> wrote:

> Hey guys,
> 
> I am running hive and I am trying to join two tables (2.2GB and 136MB) on a
> cluster of 9 nodes (replication = 3)
> 
> Hadoop version - 0.20.2
> Each data node memory - 2GB
> HADOOP_HEAPSIZE - 1000MB
> 
> other heap settings are defaults. My hive launches 40 Maptasks and every
> task failed with the same error
> 
> 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 300
> 2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
> Error running child : java.lang.OutOfMemoryError: Java heap space
>    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> 
> Looks like I need to tweak some of the heap settings for TTs to handle
> the memory efficiently. I am unable to understand which variables to
> modify (there are too many related to heap sizes).
> 
> Any specific things I must look at?
> 
> Thanks,
> 
> jS

Re: Out of heap space errors on TTs

Posted by Bejoy KS <be...@gmail.com>.
John,
       Did you try out map join with hive? It uses the Distributed Cache and
hash maps to achieve the goal.
set hive.auto.convert.join = true;
I have* *tried the same over joins involving huge tables and a few smaller
tables.My smaller tables where less than 25MB(configuration tables) and It
worked for me. In your case since the smaller table is 137MB I'm not sure
whether you should go in for this or not. Let us leave that part for the
experts to comment on.
 Also map joins  in default would work only if the size of smaller table is
less than 25Mb. You can try increasing the value that would suit your
requirements by.
*set hive.smalltable.filesize = 150000000*.
I'm really not sure whether it is advisable in your scenario. I'm leaving it
to the experts to comment on the same.

All,
          A quick query from my end. What could be the maximum size of a
file that could be distributed on the cache in map reduce jobs? I 'm looking
out for an optimal value along with the maximum permissible one(not
impacting the execution of basic map reduce ). Does that depend on your
cluster size or one your individual node hardware configuration?


On Mon, Sep 19, 2011 at 7:15 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> Hello John
>
> You can use below properties
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
> By default that values will be 10.
>
> AFAIK, you can reduce io.sort.mb. But disk usage will be high.
>
> Since this is related to mapred, I have moved this discussion to Mapreduce.
> and cc'ed to common.
>
>
> Regards,
> Uma
>
>
> ----- Original Message -----
> From: john smith <js...@gmail.com>
> Date: Monday, September 19, 2011 7:02 pm
> Subject: Re: Out of heap space errors on TTs
> To: common-user@hadoop.apache.org
>
> > Hi all,
> >
> > Thanks for the inputs...
> >
> > Can I reduce the
>  ? (owing to the fact that I have less
> > ram size ,
> > 2GB)
> >
> > My conf files doesn't have an entry mapred.child.java.opts .. So I
> > guess its
> > taking a default value of 200MB.
> >
> > Also how to decide the number of tasks per TT ? I have 4 cores per
> > node and
> > 2GB of total memory . So how many per node maximum tasks should I set?
> >
> > Thanks
> >
> > On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
> > maheswara@huawei.com> wrote:
> >
> > > Hello,
> > >
> > > You need configure heap size for child tasks using below proprty.
> > > "mapred.child.java.opts" in mapred-site.xml
> > >
> > > by default it will be 200mb. But your io.sort.mb(300) is more
> > than that.
> > > So, configure more heap space for child tasks.
> > >
> > > ex:
> > >  -Xmx512m
> > >
> > > Regards,
> > > Uma
> > >
> > > ----- Original Message -----
> > > From: john smith <js...@gmail.com>
> > > Date: Monday, September 19, 2011 6:14 pm
> > > Subject: Out of heap space errors on TTs
> > > To: common-user@hadoop.apache.org
> > >
> > > > Hey guys,
> > > >
> > > > I am running hive and I am trying to join two tables (2.2GB and
> > > > 136MB) on a
> > > > cluster of 9 nodes (replication = 3)
> > > >
> > > > Hadoop version - 0.20.2
> > > > Each data node memory - 2GB
> > > > HADOOP_HEAPSIZE - 1000MB
> > > >
> > > > other heap settings are defaults. My hive launches 40 Maptasks and
> > > > everytask failed with the same error
> > > >
> > > > 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
> > > > io.sort.mb = 300
> > > > 2011-09-19 18:37:17,223 FATAL
> > org.apache.hadoop.mapred.TaskTracker:> > Error running child :
> > java.lang.OutOfMemoryError: Java heap space
> > > >       at
> > > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)>
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
> > > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >
> > > >
> > > > Looks like I need to tweak some of the heap settings for TTs
> > to handle
> > > > the memory efficiently. I am unable to understand which
> > variables to
> > > > modify (there are too many related to heap sizes).
> > > >
> > > > Any specific things I must look at?
> > > >
> > > > Thanks,
> > > >
> > > > jS
> > > >
> > >
> >
>

Re: Out of heap space errors on TTs

Posted by Bejoy KS <be...@gmail.com>.
John,
       Did you try out map join with hive? It uses the Distributed Cache and
hash maps to achieve the goal.
set hive.auto.convert.join = true;
I have* *tried the same over joins involving huge tables and a few smaller
tables.My smaller tables where less than 25MB(configuration tables) and It
worked for me. In your case since the smaller table is 137MB I'm not sure
whether you should go in for this or not. Let us leave that part for the
experts to comment on.
 Also map joins  in default would work only if the size of smaller table is
less than 25Mb. You can try increasing the value that would suit your
requirements by.
*set hive.smalltable.filesize = 150000000*.
I'm really not sure whether it is advisable in your scenario. I'm leaving it
to the experts to comment on the same.

All,
          A quick query from my end. What could be the maximum size of a
file that could be distributed on the cache in map reduce jobs? I 'm looking
out for an optimal value along with the maximum permissible one(not
impacting the execution of basic map reduce ). Does that depend on your
cluster size or one your individual node hardware configuration?


On Mon, Sep 19, 2011 at 7:15 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> Hello John
>
> You can use below properties
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
> By default that values will be 10.
>
> AFAIK, you can reduce io.sort.mb. But disk usage will be high.
>
> Since this is related to mapred, I have moved this discussion to Mapreduce.
> and cc'ed to common.
>
>
> Regards,
> Uma
>
>
> ----- Original Message -----
> From: john smith <js...@gmail.com>
> Date: Monday, September 19, 2011 7:02 pm
> Subject: Re: Out of heap space errors on TTs
> To: common-user@hadoop.apache.org
>
> > Hi all,
> >
> > Thanks for the inputs...
> >
> > Can I reduce the
>  ? (owing to the fact that I have less
> > ram size ,
> > 2GB)
> >
> > My conf files doesn't have an entry mapred.child.java.opts .. So I
> > guess its
> > taking a default value of 200MB.
> >
> > Also how to decide the number of tasks per TT ? I have 4 cores per
> > node and
> > 2GB of total memory . So how many per node maximum tasks should I set?
> >
> > Thanks
> >
> > On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
> > maheswara@huawei.com> wrote:
> >
> > > Hello,
> > >
> > > You need configure heap size for child tasks using below proprty.
> > > "mapred.child.java.opts" in mapred-site.xml
> > >
> > > by default it will be 200mb. But your io.sort.mb(300) is more
> > than that.
> > > So, configure more heap space for child tasks.
> > >
> > > ex:
> > >  -Xmx512m
> > >
> > > Regards,
> > > Uma
> > >
> > > ----- Original Message -----
> > > From: john smith <js...@gmail.com>
> > > Date: Monday, September 19, 2011 6:14 pm
> > > Subject: Out of heap space errors on TTs
> > > To: common-user@hadoop.apache.org
> > >
> > > > Hey guys,
> > > >
> > > > I am running hive and I am trying to join two tables (2.2GB and
> > > > 136MB) on a
> > > > cluster of 9 nodes (replication = 3)
> > > >
> > > > Hadoop version - 0.20.2
> > > > Each data node memory - 2GB
> > > > HADOOP_HEAPSIZE - 1000MB
> > > >
> > > > other heap settings are defaults. My hive launches 40 Maptasks and
> > > > everytask failed with the same error
> > > >
> > > > 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
> > > > io.sort.mb = 300
> > > > 2011-09-19 18:37:17,223 FATAL
> > org.apache.hadoop.mapred.TaskTracker:> > Error running child :
> > java.lang.OutOfMemoryError: Java heap space
> > > >       at
> > > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)>
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
> > > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >
> > > >
> > > > Looks like I need to tweak some of the heap settings for TTs
> > to handle
> > > > the memory efficiently. I am unable to understand which
> > variables to
> > > > modify (there are too many related to heap sizes).
> > > >
> > > > Any specific things I must look at?
> > > >
> > > > Thanks,
> > > >
> > > > jS
> > > >
> > >
> >
>

Re: Out of heap space errors on TTs

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
Hello John

You can use below properties
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
By default that values will be 10.

AFAIK, you can reduce io.sort.mb. But disk usage will be high.

Since this is related to mapred, I have moved this discussion to Mapreduce. and cc'ed to common.


Regards,
Uma


----- Original Message -----
From: john smith <js...@gmail.com>
Date: Monday, September 19, 2011 7:02 pm
Subject: Re: Out of heap space errors on TTs
To: common-user@hadoop.apache.org

> Hi all,
> 
> Thanks for the inputs...
> 
> Can I reduce the 
 ? (owing to the fact that I have less 
> ram size ,
> 2GB)
> 
> My conf files doesn't have an entry mapred.child.java.opts .. So I 
> guess its
> taking a default value of 200MB.
> 
> Also how to decide the number of tasks per TT ? I have 4 cores per 
> node and
> 2GB of total memory . So how many per node maximum tasks should I set?
> 
> Thanks
> 
> On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
> maheswara@huawei.com> wrote:
> 
> > Hello,
> >
> > You need configure heap size for child tasks using below proprty.
> > "mapred.child.java.opts" in mapred-site.xml
> >
> > by default it will be 200mb. But your io.sort.mb(300) is more 
> than that.
> > So, configure more heap space for child tasks.
> >
> > ex:
> >  -Xmx512m
> >
> > Regards,
> > Uma
> >
> > ----- Original Message -----
> > From: john smith <js...@gmail.com>
> > Date: Monday, September 19, 2011 6:14 pm
> > Subject: Out of heap space errors on TTs
> > To: common-user@hadoop.apache.org
> >
> > > Hey guys,
> > >
> > > I am running hive and I am trying to join two tables (2.2GB and
> > > 136MB) on a
> > > cluster of 9 nodes (replication = 3)
> > >
> > > Hadoop version - 0.20.2
> > > Each data node memory - 2GB
> > > HADOOP_HEAPSIZE - 1000MB
> > >
> > > other heap settings are defaults. My hive launches 40 Maptasks and
> > > everytask failed with the same error
> > >
> > > 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
> > > io.sort.mb = 300
> > > 2011-09-19 18:37:17,223 FATAL 
> org.apache.hadoop.mapred.TaskTracker:> > Error running child : 
> java.lang.OutOfMemoryError: Java heap space
> > >       at
> > > 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
> > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > >
> > >
> > > Looks like I need to tweak some of the heap settings for TTs 
> to handle
> > > the memory efficiently. I am unable to understand which 
> variables to
> > > modify (there are too many related to heap sizes).
> > >
> > > Any specific things I must look at?
> > >
> > > Thanks,
> > >
> > > jS
> > >
> >
> 

Re: Out of heap space errors on TTs

Posted by john smith <js...@gmail.com>.
Hi,

Its a simple join,

select count(*) from customer JOIN supplier ON (customer.c_nationkey =
supplier.s_nationkey);

customer(2.2GB) and supplier (137MB) are TPCH tables generated.

Total of 40 Maps tasks are getting generated for this query.

Thanks

On Mon, Sep 19, 2011 at 7:08 PM, <be...@gmail.com> wrote:

> John
>    Can you share the hive QL you are using for joins?
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: john smith <js...@gmail.com>
> Date: Mon, 19 Sep 2011 19:02:02
> To: <co...@hadoop.apache.org>
> Reply-To: common-user@hadoop.apache.org
> Subject: Re: Out of heap space errors on TTs
>
> Hi all,
>
> Thanks for the inputs...
>
> Can I reduce the io.sort.mb ? (owing to the fact that I have less ram size
> ,
> 2GB)
>
> My conf files doesn't have an entry mapred.child.java.opts .. So I guess
> its
> taking a default value of 200MB.
>
> Also how to decide the number of tasks per TT ? I have 4 cores per node and
> 2GB of total memory . So how many per node maximum tasks should I set?
>
> Thanks
>
> On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
> maheswara@huawei.com> wrote:
>
> > Hello,
> >
> > You need configure heap size for child tasks using below proprty.
> > "mapred.child.java.opts" in mapred-site.xml
> >
> > by default it will be 200mb. But your io.sort.mb(300) is more than that.
> > So, configure more heap space for child tasks.
> >
> > ex:
> >  -Xmx512m
> >
> > Regards,
> > Uma
> >
> > ----- Original Message -----
> > From: john smith <js...@gmail.com>
> > Date: Monday, September 19, 2011 6:14 pm
> > Subject: Out of heap space errors on TTs
> > To: common-user@hadoop.apache.org
> >
> > > Hey guys,
> > >
> > > I am running hive and I am trying to join two tables (2.2GB and
> > > 136MB) on a
> > > cluster of 9 nodes (replication = 3)
> > >
> > > Hadoop version - 0.20.2
> > > Each data node memory - 2GB
> > > HADOOP_HEAPSIZE - 1000MB
> > >
> > > other heap settings are defaults. My hive launches 40 Maptasks and
> > > everytask failed with the same error
> > >
> > > 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
> > > io.sort.mb = 300
> > > 2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
> > > Error running child : java.lang.OutOfMemoryError: Java heap space
> > >       at
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
> >     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
> > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > >
> > >
> > > Looks like I need to tweak some of the heap settings for TTs to handle
> > > the memory efficiently. I am unable to understand which variables to
> > > modify (there are too many related to heap sizes).
> > >
> > > Any specific things I must look at?
> > >
> > > Thanks,
> > >
> > > jS
> > >
> >
>
>

Re: Out of heap space errors on TTs

Posted by be...@gmail.com.
John
    Can you share the hive QL you are using for joins?

Regards
Bejoy K S

-----Original Message-----
From: john smith <js...@gmail.com>
Date: Mon, 19 Sep 2011 19:02:02 
To: <co...@hadoop.apache.org>
Reply-To: common-user@hadoop.apache.org
Subject: Re: Out of heap space errors on TTs

Hi all,

Thanks for the inputs...

Can I reduce the io.sort.mb ? (owing to the fact that I have less ram size ,
2GB)

My conf files doesn't have an entry mapred.child.java.opts .. So I guess its
taking a default value of 200MB.

Also how to decide the number of tasks per TT ? I have 4 cores per node and
2GB of total memory . So how many per node maximum tasks should I set?

Thanks

On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> Hello,
>
> You need configure heap size for child tasks using below proprty.
> "mapred.child.java.opts" in mapred-site.xml
>
> by default it will be 200mb. But your io.sort.mb(300) is more than that.
> So, configure more heap space for child tasks.
>
> ex:
>  -Xmx512m
>
> Regards,
> Uma
>
> ----- Original Message -----
> From: john smith <js...@gmail.com>
> Date: Monday, September 19, 2011 6:14 pm
> Subject: Out of heap space errors on TTs
> To: common-user@hadoop.apache.org
>
> > Hey guys,
> >
> > I am running hive and I am trying to join two tables (2.2GB and
> > 136MB) on a
> > cluster of 9 nodes (replication = 3)
> >
> > Hadoop version - 0.20.2
> > Each data node memory - 2GB
> > HADOOP_HEAPSIZE - 1000MB
> >
> > other heap settings are defaults. My hive launches 40 Maptasks and
> > everytask failed with the same error
> >
> > 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
> > io.sort.mb = 300
> > 2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
> > Error running child : java.lang.OutOfMemoryError: Java heap space
> >       at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> >
> > Looks like I need to tweak some of the heap settings for TTs to handle
> > the memory efficiently. I am unable to understand which variables to
> > modify (there are too many related to heap sizes).
> >
> > Any specific things I must look at?
> >
> > Thanks,
> >
> > jS
> >
>


Re: Out of heap space errors on TTs

Posted by john smith <js...@gmail.com>.
Hi all,

Thanks for the inputs...

Can I reduce the io.sort.mb ? (owing to the fact that I have less ram size ,
2GB)

My conf files doesn't have an entry mapred.child.java.opts .. So I guess its
taking a default value of 200MB.

Also how to decide the number of tasks per TT ? I have 4 cores per node and
2GB of total memory . So how many per node maximum tasks should I set?

Thanks

On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> Hello,
>
> You need configure heap size for child tasks using below proprty.
> "mapred.child.java.opts" in mapred-site.xml
>
> by default it will be 200mb. But your io.sort.mb(300) is more than that.
> So, configure more heap space for child tasks.
>
> ex:
>  -Xmx512m
>
> Regards,
> Uma
>
> ----- Original Message -----
> From: john smith <js...@gmail.com>
> Date: Monday, September 19, 2011 6:14 pm
> Subject: Out of heap space errors on TTs
> To: common-user@hadoop.apache.org
>
> > Hey guys,
> >
> > I am running hive and I am trying to join two tables (2.2GB and
> > 136MB) on a
> > cluster of 9 nodes (replication = 3)
> >
> > Hadoop version - 0.20.2
> > Each data node memory - 2GB
> > HADOOP_HEAPSIZE - 1000MB
> >
> > other heap settings are defaults. My hive launches 40 Maptasks and
> > everytask failed with the same error
> >
> > 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
> > io.sort.mb = 300
> > 2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
> > Error running child : java.lang.OutOfMemoryError: Java heap space
> >       at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> >
> > Looks like I need to tweak some of the heap settings for TTs to handle
> > the memory efficiently. I am unable to understand which variables to
> > modify (there are too many related to heap sizes).
> >
> > Any specific things I must look at?
> >
> > Thanks,
> >
> > jS
> >
>

Re: Out of heap space errors on TTs

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
Hello,

You need configure heap size for child tasks using below proprty.
"mapred.child.java.opts" in mapred-site.xml

by default it will be 200mb. But your io.sort.mb(300) is more than that.
So, configure more heap space for child tasks.

ex:
  -Xmx512m

Regards,
Uma

----- Original Message -----
From: john smith <js...@gmail.com>
Date: Monday, September 19, 2011 6:14 pm
Subject: Out of heap space errors on TTs
To: common-user@hadoop.apache.org

> Hey guys,
> 
> I am running hive and I am trying to join two tables (2.2GB and 
> 136MB) on a
> cluster of 9 nodes (replication = 3)
> 
> Hadoop version - 0.20.2
> Each data node memory - 2GB
> HADOOP_HEAPSIZE - 1000MB
> 
> other heap settings are defaults. My hive launches 40 Maptasks and 
> everytask failed with the same error
> 
> 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask: 
> io.sort.mb = 300
> 2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
> Error running child : java.lang.OutOfMemoryError: Java heap space
> 	at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> 
> Looks like I need to tweak some of the heap settings for TTs to handle
> the memory efficiently. I am unable to understand which variables to
> modify (there are too many related to heap sizes).
> 
> Any specific things I must look at?
> 
> Thanks,
> 
> jS
>