You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Ignatescu, Ionut" <io...@amazon.com> on 2012/04/11 20:37:56 UTC

Question about configuring HBase file size

My use-case: I have several HBase tables in which entries are pushed constantly(300k entries/5min,6-7Gb/day). I started with tables splitted in 32 regions.
File size is currently set to 1Gb. I'm using this tables to perfom real-time scans via a web app.
Problem: Sometimes,but pretty frequently tables are frozen and I get scanner timeout exception.I found in logs that many splits and compactions are performing and I think this are the cause. If I stop pushing data into tables, are scans are running ok.
Could you provide me some advices to do in order to avoid this problem?

Thanks!



Amazon Development Center (Romania) S.R.L. registered office: 3E Palat Street, floor 2, Iasi, Iasi County, Iasi 700032, Romania. Registered in Romania. Registration number J22/2621/2005.

Re: Question about configuring HBase file size

Posted by Joey Echeverria <jo...@cloudera.com>.
Are your keys well distributed? If so, you could simply split into
more regions initially to delay early splits. You can also turn off
automatic splitting (effectively) by setting a large max file size,
say 100 GB. This will mean you'll need to split by hand if necessary.

You can also play around with compaction settings to avoid those
causing problems. Do you happen to know if your issues are with just
major compactions, or are minor compactions also causing problems?
Lastly, depending on your SLA for making new data visible, you could
switch to bulk loading the data once a day or once an hour so that you
start off with a smaller number of HFiles, and thus need to compact
less. If you have a healthy ratio of deletes or have TTLs turned on,
you still want to major compact once a day or once a week for data to
really get deleted.

A lot of this is discussed in the HBase Reference Guide
(http://hbase.apache.org/book.html) which I highly recommend if you
haven't read it yet.

-Joey

On Wed, Apr 11, 2012 at 2:37 PM, Ignatescu, Ionut <io...@amazon.com> wrote:
> My use-case: I have several HBase tables in which entries are pushed constantly(300k entries/5min,6-7Gb/day). I started with tables splitted in 32 regions.
> File size is currently set to 1Gb. I'm using this tables to perfom real-time scans via a web app.
> Problem: Sometimes,but pretty frequently tables are frozen and I get scanner timeout exception.I found in logs that many splits and compactions are performing and I think this are the cause. If I stop pushing data into tables, are scans are running ok.
> Could you provide me some advices to do in order to avoid this problem?
>
> Thanks!
>
>
>
> Amazon Development Center (Romania) S.R.L. registered office: 3E Palat Street, floor 2, Iasi, Iasi County, Iasi 700032, Romania. Registered in Romania. Registration number J22/2621/2005.



-- 
Joey Echeverria
Senior Solutions Architect
Cloudera, Inc.