You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Matt Vonkip <ma...@yahoo.com> on 2012/01/05 00:07:08 UTC

drop table -> java.lang.OutOfMemoryError: Java heap space

Hi folks,

I am using elastic-mapreduce in the Amazon EC2 eco-system and would like to upgrade from Hive 0.5 on Hadoop 0.20 to Hive 0.7.1 on Hadoop 0.20.205.  I created a new metastore (on S3) to support testing the latter and have run into some problems.  I have about 15000 partitions in S3 and in the old version of Hive/Hadoop, I have no problem creating a table, recovering the partitions, and then dropping the table.  In the new version of Hive/Hadoop, the first two steps are successful, but I run into a "java.lang.OutOfMemoryError: Java heap space" error when I try to drop the table.

When I look at the output of "set;" from the hive prompt, I see several environment variables related to heap size.  I was able to augment HADOOP_DATANODE_HEAPSIZE and HADOOP_NAMENODE_HEAPSIZE each to 4096 (2048 is sufficient in 0.5/0.20), but I see other parameters including HADOOP_HEAPSIZE that I cannot seem to change.  To be fair, I'm just shooting in the dark here and unable to decipher from the error message *which* heap is too small.

If this is already documented somewhere (neither basic tutorials nor google searches helped), I would be grateful for a reference and happy to summarize what I learn here.  Or, if you simply have an answer ... well, any help would be most appreciated!

Sincerely,
Matt Vonkip

Re: drop table -> java.lang.OutOfMemoryError: Java heap space

Posted by Sam Wilson <sw...@monetate.com>.

I recommend trying a daily partitioning scheme over an hourly one. We had a similar setup and ran into the same problem and ultimately found that daily works fine for us, even with larger file sizes.

At the very least it is worth evaluating. 

Sent from my iPhone

On Jan 5, 2012, at 2:23 PM, Matt Vonkip <ma...@yahoo.com> wrote:

> Shoot, I meant to reply to the group, not respond to Mark directly.  (Mark replied offline to me; not sure the etiquette in pasting that response in here as well!)
> 
> Hi Mark, thanks for the response!  I tried using the memory-intensive boostrap action and got a different error; however, I'm not sure if it represents progress in the right direction or regression.  (I thought the memory-intensive script was for memory intensive map-reduce jobs -- not table DDL.  So I am wondering if it made things even worse.)
> 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 
> As for the other suggestion, I agree that 15k partitions (and growing) is unruly; but, the files are not small!  Each is over one gigabyte and represents one hour from the past twenty months.  I would imagine others must have similar setups and have some way around my issue.  Also, since it worked in the older hadoop/hive stack, I'm suspicious that there is some configuration item I should be able to tweak.
> 
> In the meantime, I am tempted to drop the entire database and recreate from scratch (since all tables are external anyway).  If no solution is found, we will probably look into some kind of hybrid system where older data is archived in other tables and a union is used in queries.
> 
> Sincerely,
> Matt
> 
>

Re: drop table -> java.lang.OutOfMemoryError: Java heap space

Posted by Matt Vonkip <ma...@yahoo.com>.

Shoot, I meant to reply to the group, not respond to Mark directly.  (Mark replied offline to me; not sure the etiquette in pasting that response in here as well!)


Hi Mark, thanks for the response!  I tried using the 
memory-intensive boostrap action and got a different error; however, 
I'm not sure if it represents progress in the right direction or 
regression.  (I thought the memory-intensive script was for memory 
intensive map-reduce jobs -- not table DDL.  So I am wondering if it 
made things even worse.)


java.lang.OutOfMemoryError: GC overhead limit exceeded

As for the other suggestion, I agree that 15k partitions (and growing) is 
unruly; but, the files are not small!  Each is over one gigabyte and represents 
one hour from the past twenty months.  I would imagine others must have 
similar setups and have some way around my issue.  Also, since it worked in the older hadoop/hive stack, I'm suspicious that there is some configuration item I should be able to tweak.


In the meantime, I am 
tempted to drop the entire database and recreate from scratch (since all tables are external anyway).  If no solution is found, we will probably look into some kind of hybrid system where older data is archived in 
other tables and a union is used in queries.


Sincerely,
Matt

Re: drop table -> java.lang.OutOfMemoryError: Java heap space

Posted by Mark Grover <mg...@oanda.com>.

Hi Matt,
You might want to try using s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive in your bootstrap action and see if that helps.
I would also suggest that you reconsider if having 15000 partitions is the right thing to do and make sure you are not suffering from the small files problem in the long run:-)
http://www.cloudera.com/blog/2009/02/the-small-files-problem/

Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 



----- Original Message -----
From: "Matt Vonkip" <ma...@yahoo.com>
To: user@hive.apache.org
Sent: Wednesday, January 4, 2012 6:07:08 PM
Subject: drop table -> java.lang.OutOfMemoryError: Java heap space



Hi folks, 


I am using elastic-mapreduce in the Amazon EC2 eco-system and would like to upgrade from Hive 0.5 on Hadoop 0.20 to Hive 0.7.1 on Hadoop 0.20.205. I created a new metastore (on S3) to support testing the latter and have run into some problems. I have about 15000 partitions in S3 and in the old version of Hive/Hadoop, I have no problem creating a table, recovering the partitions, and then dropping the table. In the new version of Hive/Hadoop, the first two steps are successful, but I run into a "java.lang.OutOfMemoryError: Java heap space" error when I try to drop the table. 


When I look at the output of "set;" from the hive prompt, I see several environment variables related to heap size. I was able to augment HADOOP_DATANODE_HEAPSIZE and HADOOP_NAMENODE_HEAPSIZE each to 4096 (2048 is sufficient in 0.5/0.20), but I see other parameters including HADOOP_HEAPSIZE that I cannot seem to change. To be fair, I'm just shooting in the dark here and unable to decipher from the error message *which* heap is too small. 


If this is already documented somewhere (neither basic tutorials nor google searches helped), I would be grateful for a reference and happy to summarize what I learn here. Or, if you simply have an answer ... well, any help would be most appreciated! 


Sincerely, 
Matt Vonkip