You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Gaurav Agarwal <ga...@gmail.com> on 2014/12/28 18:48:52 UTC

Hbase overhead for completely inactive regions

Hi All,

I have timeseries data that has most of the the regions completely
inactive. With my current set of resources and estimates, I would end up
with close to 15TB of data per RegionServer and with a region size of about
15G, this would mean 1000 regions per region server. On whole I expect
close to 150TB of data which would lead to close to 10,000 total regions
and was thinking of handling it all with around 10-15 nodes.

This is a write intensive process adn read QPS will be fairly low. Even at
write time I expect only 1-3 regions per region server to be actively
written to.

I wanted to know more about the memory overhead associated with completely
inactive regions. Can someone pls help me out with the details of what are
typical minimum memory usage overheads (on memstore, blockcahe, indexes and
bloomfilters) for such inactive (cold) regions?

If the overhead is nill or minuscule then, I should be able to comfortably
run these regiosservers with ~10GB RAM. Any other gotchas I need to be
careful about here?

--cheers, gaurav

Re: Hbase overhead for completely inactive regions

Posted by Stack <st...@duboce.net>.
On Sun, Dec 28, 2014 at 3:03 PM, Ted Yu <yu...@gmail.com> wrote:

> Since read QPS is fairly low, inactive regions wouldn't take much space in
> blockcahe.
>

Blocks whether data, index or blooms from inactive regions take up no space
in the blockcache. For a block to get into the blockcache, it needs to be
read. If read and it goes cold in the blockcache, it will be evicted.

In later versions of 0.98 -- post 0.98.6 IIRC -- you can get pretty
detailed report on blockcache content via UI. You can ask for a dump of the
metadata on blocks in the blockcache and the files they belong to from
which you can see which regions have content in blockcache.

Blockcache by default is allocated 40% of heap. If low read rate, you could
tune this down.



> Inactive regions wouldn't consume much memstore either since they're cold.
>
>
Inactive regions per column family will consume at
least "hbase.hregion.memstore.mslab.chunksize" which defaults to 2MB.
See HeapMemStoreLAB.java class comment for more.

St.Ack


For 1-3 regions per region server, MTTR would be kept low if the hot
> regions are evenly distributed.



> Cheers
>
> On Sun, Dec 28, 2014 at 9:48 AM, Gaurav Agarwal <ga...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I have timeseries data that has most of the the regions completely
> > inactive. With my current set of resources and estimates, I would end up
> > with close to 15TB of data per RegionServer and with a region size of
> about
> > 15G, this would mean 1000 regions per region server. On whole I expect
> > close to 150TB of data which would lead to close to 10,000 total regions
> > and was thinking of handling it all with around 10-15 nodes.
> >
> > This is a write intensive process adn read QPS will be fairly low. Even
> at
> > write time I expect only 1-3 regions per region server to be actively
> > written to.
> >
> > I wanted to know more about the memory overhead associated with
> completely
> > inactive regions. Can someone pls help me out with the details of what
> are
> > typical minimum memory usage overheads (on memstore, blockcahe, indexes
> and
> > bloomfilters) for such inactive (cold) regions?
> >
> > If the overhead is nill or minuscule then, I should be able to
> comfortably
> > run these regiosservers with ~10GB RAM. Any other gotchas I need to be
> > careful about here?
> >
> > --cheers, gaurav
> >
>

Re: Hbase overhead for completely inactive regions

Posted by Ted Yu <yu...@gmail.com>.
Since read QPS is fairly low, inactive regions wouldn't take much space in
blockcahe.
Inactive regions wouldn't consume much memstore either since they're cold.

For 1-3 regions per region server, MTTR would be kept low if the hot
regions are evenly distributed.

Cheers

On Sun, Dec 28, 2014 at 9:48 AM, Gaurav Agarwal <ga...@gmail.com>
wrote:

> Hi All,
>
> I have timeseries data that has most of the the regions completely
> inactive. With my current set of resources and estimates, I would end up
> with close to 15TB of data per RegionServer and with a region size of about
> 15G, this would mean 1000 regions per region server. On whole I expect
> close to 150TB of data which would lead to close to 10,000 total regions
> and was thinking of handling it all with around 10-15 nodes.
>
> This is a write intensive process adn read QPS will be fairly low. Even at
> write time I expect only 1-3 regions per region server to be actively
> written to.
>
> I wanted to know more about the memory overhead associated with completely
> inactive regions. Can someone pls help me out with the details of what are
> typical minimum memory usage overheads (on memstore, blockcahe, indexes and
> bloomfilters) for such inactive (cold) regions?
>
> If the overhead is nill or minuscule then, I should be able to comfortably
> run these regiosservers with ~10GB RAM. Any other gotchas I need to be
> careful about here?
>
> --cheers, gaurav
>