You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Bejoy KS <be...@gmail.com> on 2011/09/19 16:39:59 UTC

Re: Out of heap space errors on TTs

John,
       Did you try out map join with hive? It uses the Distributed Cache and
hash maps to achieve the goal.
set hive.auto.convert.join = true;
I have* *tried the same over joins involving huge tables and a few smaller
tables.My smaller tables where less than 25MB(configuration tables) and It
worked for me. In your case since the smaller table is 137MB I'm not sure
whether you should go in for this or not. Let us leave that part for the
experts to comment on.
 Also map joins  in default would work only if the size of smaller table is
less than 25Mb. You can try increasing the value that would suit your
requirements by.
*set hive.smalltable.filesize = 150000000*.
I'm really not sure whether it is advisable in your scenario. I'm leaving it
to the experts to comment on the same.

All,
          A quick query from my end. What could be the maximum size of a
file that could be distributed on the cache in map reduce jobs? I 'm looking
out for an optimal value along with the maximum permissible one(not
impacting the execution of basic map reduce ). Does that depend on your
cluster size or one your individual node hardware configuration?


On Mon, Sep 19, 2011 at 7:15 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> Hello John
>
> You can use below properties
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
> By default that values will be 10.
>
> AFAIK, you can reduce io.sort.mb. But disk usage will be high.
>
> Since this is related to mapred, I have moved this discussion to Mapreduce.
> and cc'ed to common.
>
>
> Regards,
> Uma
>
>
> ----- Original Message -----
> From: john smith <js...@gmail.com>
> Date: Monday, September 19, 2011 7:02 pm
> Subject: Re: Out of heap space errors on TTs
> To: common-user@hadoop.apache.org
>
> > Hi all,
> >
> > Thanks for the inputs...
> >
> > Can I reduce the
>  ? (owing to the fact that I have less
> > ram size ,
> > 2GB)
> >
> > My conf files doesn't have an entry mapred.child.java.opts .. So I
> > guess its
> > taking a default value of 200MB.
> >
> > Also how to decide the number of tasks per TT ? I have 4 cores per
> > node and
> > 2GB of total memory . So how many per node maximum tasks should I set?
> >
> > Thanks
> >
> > On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
> > maheswara@huawei.com> wrote:
> >
> > > Hello,
> > >
> > > You need configure heap size for child tasks using below proprty.
> > > "mapred.child.java.opts" in mapred-site.xml
> > >
> > > by default it will be 200mb. But your io.sort.mb(300) is more
> > than that.
> > > So, configure more heap space for child tasks.
> > >
> > > ex:
> > >  -Xmx512m
> > >
> > > Regards,
> > > Uma
> > >
> > > ----- Original Message -----
> > > From: john smith <js...@gmail.com>
> > > Date: Monday, September 19, 2011 6:14 pm
> > > Subject: Out of heap space errors on TTs
> > > To: common-user@hadoop.apache.org
> > >
> > > > Hey guys,
> > > >
> > > > I am running hive and I am trying to join two tables (2.2GB and
> > > > 136MB) on a
> > > > cluster of 9 nodes (replication = 3)
> > > >
> > > > Hadoop version - 0.20.2
> > > > Each data node memory - 2GB
> > > > HADOOP_HEAPSIZE - 1000MB
> > > >
> > > > other heap settings are defaults. My hive launches 40 Maptasks and
> > > > everytask failed with the same error
> > > >
> > > > 2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
> > > > io.sort.mb = 300
> > > > 2011-09-19 18:37:17,223 FATAL
> > org.apache.hadoop.mapred.TaskTracker:> > Error running child :
> > java.lang.OutOfMemoryError: Java heap space
> > > >       at
> > > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)>
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
> > > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >
> > > >
> > > > Looks like I need to tweak some of the heap settings for TTs
> > to handle
> > > > the memory efficiently. I am unable to understand which
> > variables to
> > > > modify (there are too many related to heap sizes).
> > > >
> > > > Any specific things I must look at?
> > > >
> > > > Thanks,
> > > >
> > > > jS
> > > >
> > >
> >
>