You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Sateesh Lakkarsu <la...@gmail.com> on 2012/07/22 20:36:35 UTC

hbase HFile v1 size limit

"For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with
a default of 256Mb. For 0.92.x codebase, due to the HFile v2 change much
larger regionsizes can be supported (e.g., 20Gb)." ... from
http://hbase.apache.org/book.html ...

Unfortunately, I cannot upgrade to 0.92.x/cdh4 right away and have limited
hardware for some time, so want to understand the reasoning behind the
HFile limit in v1 so that I can weigh my options. (make use of bigger
region size, increase number of regions/regionserver, wait for more nodes
and spread)

- why is the limit 4GB in 0.90.x?

- if it is a hard limit... would like to hear experiences from people about:

-- what is the normal region size?

-- has anyone been running with 3-4G region sizes.

I do understand compactions will take longer, index size can be big
depending on key size, read performance can be impacted, may be region
splitting will be a problem... what else should I be worried about?

I have pre-split regions with known bounds, lots of RAM on each node (96G),
no swap, controlled compactions, no MR on this etc... so I believe I have
optimal set-up.

Thanks.

Re: hbase HFile v1 size limit

Posted by Stack <st...@duboce.net>.

On Sun, Jul 22, 2012 at 8:36 PM, Sateesh Lakkarsu <la...@gmail.com> wrote:
> Unfortunately, I cannot upgrade to 0.92.x/cdh4 right away and have limited
> hardware for some time, so want to understand the reasoning behind the
> HFile limit in v1 so that I can weigh my options. (make use of bigger
> region size, increase number of regions/regionserver, wait for more nodes
> and spread)
>
> - why is the limit 4GB in 0.90.x?
>
> - if it is a hard limit... would like to hear experiences from people about:
>

Its not a hard limit

You've seen the note here: http://hbase.apache.org/book.html#d1952e10888 ?

> I have pre-split regions with known bounds, lots of RAM on each node (96G),
> no swap, controlled compactions, no MR on this etc... so I believe I have
> optimal set-up.
>

Can you try your hfilev1s with big files and see what you get?  Maybe
if your cell size is large and your keys small, you might be able to
go relatively big with hfilev1 regions?

St.Ack