You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marcus Schlüter <ma...@mac.com> on 2008/07/10 11:32:35 UTC

Hbase regionserver heap space problem

Hi everyone,

We would like to use Hbase and Hadoop.
But when we tried to use real data with our test setup, we saw a lot  
of crashes and could not succeed to insert the amount of data we are  
trying to insert into an Hbase table.
Our goal is to have about 100 million of rows in one table, with each  
row having about 100byte of raw data.
Our testsetup consists of the following servers:

3 x HP DL385 with 4GB RAM, 2x2,8Ghz Opterons and Smartarray RAID5 with  
an capacity of 400GB. (all used as datanodes, and one of them also as  
the namenode)
1 x HP DL380 with 3GB RAM, 2x3,4Ghz Dualcore Xeons and Smartarray  
RAID5 with an capacity of 320GB for hbase (master and regionserver).

We used hadoop 0.16.4 with a replaction level of 2 and hbase 0.1.3.
Hbase is configured to use 2GB of heap space.
The table was created with the following query:

create table logdata (logtype MAX_VERSIONS=1 COMPRESSION=BLOCK,  
banner_id MAX_VERSIONS=1, contentunit_id MAX_VERSIONS=1, campaign_id  
MAX_VERSIONS=1, network MAX_VERSIONS=1, geodata MAX_VERSIONS=1  
COMPRESSION=BLOCK, client_data MAX_VERSIONS=1 COMPRESSION=BLOCK,  
profile_data MAX_VERSIONS=1 COMPRESSION=BLOCK, keyword MAX_VERSIONS=1  
COMPRESSION=BLOCK, tstamp MAX_VERSIONS=1, time MAX_VERSIONS=1);


there problem is, that the regionserver runs out of heap space and  
throws the following exception after inserting a few million rows (not  
always the same number of rows, ranging from 3 to about 10 million):

Exception in thread "org.apache.hadoop.dfs.DFSClient 
$LeaseChecker@69e328e0" java.lang.OutOfMemoryError: Java heap space
         at java.io.DataInputStream.<init>(DataInputStream.java:42)
         at org.apache.hadoop.ipc.Client 
$Connection.setupIOstreams(Client.java:186)
         at org.apache.hadoop.ipc.Client.getConnection(Client.java:578)
         at org.apache.hadoop.ipc.Client.call(Client.java:501)
         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
         at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
         at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
         at  
sun 
.reflect 
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 
25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at  
org 
.apache 
.hadoop 
.io 
.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java: 
82)
         at  
org 
.apache 
.hadoop 
.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
         at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
         at org.apache.hadoop.dfs.DFSClient 
$LeaseChecker.run(DFSClient.java:596)
         at java.lang.Thread.run(Thread.java:619)
Exception in thread "ResponseProcessor for block  
blk_7988192980299756280" java.lang.OutOfMemoryError: Java heap space
Exception in thread "IPC Server Responder" Exception in thread  
"org.apache.hadoop.io.ObjectWritable Connection Culler" Exception in  
thread "IPC Client connection to /192.168.1.117:54310"  
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space

any idears why we always see this crashes and if hbase should be able  
to handle this amount of data in the setup we use?

On a side note, we also observe that hbase seems to have a large  
storage overhead.
When we insert about 1GB of rawdata into hbase, it uses about 8GB of  
HDFS space (when taking into account the replication).
Is this large overhead expected?

/Marcus


Re: Hbase regionserver heap space problem

Posted by stack <st...@duboce.net>.
Here's a few more comments on top of Jean-Daniels's suggestion:

Marcus Schlüter wrote:
> Hi everyone
>
> We used hadoop 0.16.4 with a replaction level of 2 and hbase 0.1.3.
Make sure you tell hbase that you only want a replication of 2: See 
http://wiki.apache.org/hadoop/Hbase/FAQ#12.
>
> On a side note, we also observe that hbase seems to have a large 
> storage overhead.
> When we insert about 1GB of rawdata into hbase, it uses about 8GB of 
> HDFS space (when taking into account the replication).
> Is this large overhead expected? 
Your value is small; 100 bytes.  Then there are keys whose form is 
rowid/columnname/timestamp.

Can you slice and dice using your hadoop fs dus and figure where the 
bulk of the 8G is under you hbase.rootdir? (You may have an extra 
replica that you did not expect given #12 from the FAQ above).

St.Ack

Re: Hbase regionserver heap space problem

Posted by Marcus Schlüter <ma...@mac.com>.
Hi J-D,

thanks for your reply.
I'll try it out with your proposal.

Marcus


Am 10.07.2008 um 12:53 schrieb Jean-Daniel Cryans:

> Hi Marcus,
>
> I don't know if it's related to your problem but in your machine  
> setup you
> seem to imply that you have one region server and 3 datanodes on four
> different machines. If it's really the case, I recommend that you  
> instead
> have 1 machine for the Namenode and Master and three other machines as
> Datanodes and RegionServers.
>
> J-D
>
> On Thu, Jul 10, 2008 at 5:32 AM, Marcus Schlüter <marcus.schlueter@mac.com 
> >
> wrote:
>
>> Hi everyone,
>>
>> We would like to use Hbase and Hadoop.
>> But when we tried to use real data with our test setup, we saw a  
>> lot of
>> crashes and could not succeed to insert the amount of data we are  
>> trying to
>> insert into an Hbase table.
>> Our goal is to have about 100 million of rows in one table, with  
>> each row
>> having about 100byte of raw data.
>> Our testsetup consists of the following servers:
>>
>> 3 x HP DL385 with 4GB RAM, 2x2,8Ghz Opterons and Smartarray RAID5  
>> with an
>> capacity of 400GB. (all used as datanodes, and one of them also as  
>> the
>> namenode)
>> 1 x HP DL380 with 3GB RAM, 2x3,4Ghz Dualcore Xeons and Smartarray  
>> RAID5
>> with an capacity of 320GB for hbase (master and regionserver).
>>
>> We used hadoop 0.16.4 with a replaction level of 2 and hbase 0.1.3.
>> Hbase is configured to use 2GB of heap space.
>> The table was created with the following query:
>>
>> create table logdata (logtype MAX_VERSIONS=1 COMPRESSION=BLOCK,  
>> banner_id
>> MAX_VERSIONS=1, contentunit_id MAX_VERSIONS=1, campaign_id  
>> MAX_VERSIONS=1,
>> network MAX_VERSIONS=1, geodata MAX_VERSIONS=1 COMPRESSION=BLOCK,
>> client_data MAX_VERSIONS=1 COMPRESSION=BLOCK, profile_data  
>> MAX_VERSIONS=1
>> COMPRESSION=BLOCK, keyword MAX_VERSIONS=1 COMPRESSION=BLOCK, tstamp
>> MAX_VERSIONS=1, time MAX_VERSIONS=1);
>>
>>
>> there problem is, that the regionserver runs out of heap space and  
>> throws
>> the following exception after inserting a few million rows (not  
>> always the
>> same number of rows, ranging from 3 to about 10 million):
>>
>> Exception in thread "org.apache.hadoop.dfs.DFSClient 
>> $LeaseChecker@69e328e0"
>> java.lang.OutOfMemoryError: Java heap space
>>       at java.io.DataInputStream.<init>(DataInputStream.java:42)
>>       at
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java: 
>> 186)
>>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:578)
>>       at org.apache.hadoop.ipc.Client.call(Client.java:501)
>>       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
>>       at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
>>       at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>>       at
>> sun 
>> .reflect 
>> .DelegatingMethodAccessorImpl 
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at
>> org 
>> .apache 
>> .hadoop 
>> .io 
>> .retry 
>> .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>       at
>> org 
>> .apache 
>> .hadoop 
>> .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java: 
>> 59)
>>       at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
>>       at
>> org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:596)
>>       at java.lang.Thread.run(Thread.java:619)
>> Exception in thread "ResponseProcessor for block  
>> blk_7988192980299756280"
>> java.lang.OutOfMemoryError: Java heap space
>> Exception in thread "IPC Server Responder" Exception in thread
>> "org.apache.hadoop.io.ObjectWritable Connection Culler" Exception  
>> in thread
>> "IPC Client connection to /192.168.1.117:54310"
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>>
>> any idears why we always see this crashes and if hbase should be  
>> able to
>> handle this amount of data in the setup we use?
>>
>> On a side note, we also observe that hbase seems to have a large  
>> storage
>> overhead.
>> When we insert about 1GB of rawdata into hbase, it uses about 8GB  
>> of HDFS
>> space (when taking into account the replication).
>> Is this large overhead expected?
>>
>> /Marcus
>>
>>


Re: Hbase regionserver heap space problem

Posted by Jean-Daniel Cryans <jd...@gmail.com>.
Hi Marcus,

I don't know if it's related to your problem but in your machine setup you
seem to imply that you have one region server and 3 datanodes on four
different machines. If it's really the case, I recommend that you instead
have 1 machine for the Namenode and Master and three other machines as
Datanodes and RegionServers.

J-D

On Thu, Jul 10, 2008 at 5:32 AM, Marcus Schlüter <ma...@mac.com>
wrote:

> Hi everyone,
>
> We would like to use Hbase and Hadoop.
> But when we tried to use real data with our test setup, we saw a lot of
> crashes and could not succeed to insert the amount of data we are trying to
> insert into an Hbase table.
> Our goal is to have about 100 million of rows in one table, with each row
> having about 100byte of raw data.
> Our testsetup consists of the following servers:
>
> 3 x HP DL385 with 4GB RAM, 2x2,8Ghz Opterons and Smartarray RAID5 with an
> capacity of 400GB. (all used as datanodes, and one of them also as the
> namenode)
> 1 x HP DL380 with 3GB RAM, 2x3,4Ghz Dualcore Xeons and Smartarray RAID5
> with an capacity of 320GB for hbase (master and regionserver).
>
> We used hadoop 0.16.4 with a replaction level of 2 and hbase 0.1.3.
> Hbase is configured to use 2GB of heap space.
> The table was created with the following query:
>
> create table logdata (logtype MAX_VERSIONS=1 COMPRESSION=BLOCK, banner_id
> MAX_VERSIONS=1, contentunit_id MAX_VERSIONS=1, campaign_id MAX_VERSIONS=1,
> network MAX_VERSIONS=1, geodata MAX_VERSIONS=1 COMPRESSION=BLOCK,
> client_data MAX_VERSIONS=1 COMPRESSION=BLOCK, profile_data MAX_VERSIONS=1
> COMPRESSION=BLOCK, keyword MAX_VERSIONS=1 COMPRESSION=BLOCK, tstamp
> MAX_VERSIONS=1, time MAX_VERSIONS=1);
>
>
> there problem is, that the regionserver runs out of heap space and throws
> the following exception after inserting a few million rows (not always the
> same number of rows, ranging from 3 to about 10 million):
>
> Exception in thread "org.apache.hadoop.dfs.DFSClient$LeaseChecker@69e328e0"
> java.lang.OutOfMemoryError: Java heap space
>        at java.io.DataInputStream.<init>(DataInputStream.java:42)
>        at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:186)
>        at org.apache.hadoop.ipc.Client.getConnection(Client.java:578)
>        at org.apache.hadoop.ipc.Client.call(Client.java:501)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
>        at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
>        at
> org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:596)
>        at java.lang.Thread.run(Thread.java:619)
> Exception in thread "ResponseProcessor for block blk_7988192980299756280"
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "IPC Server Responder" Exception in thread
> "org.apache.hadoop.io.ObjectWritable Connection Culler" Exception in thread
> "IPC Client connection to /192.168.1.117:54310"
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
>
> any idears why we always see this crashes and if hbase should be able to
> handle this amount of data in the setup we use?
>
> On a side note, we also observe that hbase seems to have a large storage
> overhead.
> When we insert about 1GB of rawdata into hbase, it uses about 8GB of HDFS
> space (when taking into account the replication).
> Is this large overhead expected?
>
> /Marcus
>
>