You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Christian Pfarr <z0...@pm.me.INVALID> on 2021/07/12 09:15:32 UTC

Number of Regions with small Tables

Hello @all,

i´ve a quesion regarding controlling the number of regions on small tables in HBase.
But first i have to give you some hints about our Usecase.

We´ve built a lambda architecture with HDFS (Batch), HBase(Speed) and Drill as Serving Layer where we are combining Parquet Files from HDFS with HBase Rows that are newer then the most recent Row in HDFS.
The HBase table is filled in realtime via Nifi, while it is cleaned up every Batch (nightly) so that Drill can put the most workload on HDFS.
Unfortunately the hbase table is very small and because of this, we have only one region and because of that, drill cannot parallelize the query, which leads to long query times.

If i pre-split the hbase table everything is fine, until the balancer comes and merges the small regions. So after a few hours everything is slow again :-/

So... my question is now, whats the best way to handle these parallization issue.
I thought about setting hbase.hregion.max.filesize to a very small number, for example HDFS Blocksize = 128 MB but i´m not shure if this leads to new problems.

What do you think? Is there a better way to handle this?

Regards,
z0ltrix

Re: Number of Regions with small Tables

Posted by Christian Pfarr <z0...@pm.me.INVALID>.

Any hints on that?

Regards,
z0ltrix

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

Christian Pfarr <z0...@pm.me.INVALID> schrieb am Montag, 12. Juli 2021 um 13:45:

> ah, ok... thought this was done by the balancer...
> 

> normalizer is enabled (checked via hbase shell), but with no special configuration than in hbase-default.xml
> 

> We run hbase 1.5.0 atm...
> 

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> 

> Mallikarjun mallik.v.arjun@gmail.com schrieb am Montag, 12. Juli 2021 um 13:16:
> 

> > Do you have any configuration for Region Normalizer (
> 

> > https://hbase.apache.org/book.html#normalizer) or something?
> 

> > Balancer does not split or merge regions. AFAIK, split policy controlled by
> 

> > `hbase.regionserver.region.split.policy` does the splitting and there is
> 

> > nothing similar for merges.
> 

> > 

> 

> > Mallikarjun
> 

> > On Mon, Jul 12, 2021 at 2:48 PM Christian Pfarr z0ltrix@pm.me.invalid
> 

> > wrote:
> 

> > > Hello @all,
> 

> > > i´ve a quesion regarding controlling the number of regions on small tables
> 

> > > in HBase.
> 

> > > But first i have to give you some hints about our Usecase.
> 

> > > We´ve built a lambda architecture with HDFS (Batch), HBase(Speed) and
> 

> > > Drill as Serving Layer where we are combining Parquet Files from HDFS with
> 

> > > HBase Rows that are newer then the most recent Row in HDFS.
> 

> > > The HBase table is filled in realtime via Nifi, while it is cleaned up
> 

> > > every Batch (nightly) so that Drill can put the most workload on HDFS.
> 

> > > Unfortunately the hbase table is very small and because of this, we have
> 

> > > only one region and because of that, drill cannot parallelize the query,
> 

> > > which leads to long query times.
> 

> > > If i pre-split the hbase table everything is fine, until the balancer
> 

> > > comes and merges the small regions. So after a few hours everything is slow
> 

> > > again :-/
> 

> > > So... my question is now, whats the best way to handle these parallization
> 

> > > issue.
> 

> > > I thought about setting hbase.hregion.max.filesize to a very small
> 

> > > number, for example HDFS Blocksize = 128 MB but i´m not shure if this leads
> 

> > > to new problems.
> 

> > > What do you think? Is there a better way to handle this?
> 

> > > Regards,
> 

> > > z0ltrix

Re: Number of Regions with small Tables

Posted by Christian Pfarr <z0...@pm.me.INVALID>.

ah, ok... thought this was done by the balancer...

normalizer is enabled (checked via hbase shell), but with no special configuration than in hbase-default.xml

We run hbase 1.5.0 atm...

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

Mallikarjun <ma...@gmail.com> schrieb am Montag, 12. Juli 2021 um 13:16:

> Do you have any configuration for Region Normalizer (
> 

> https://hbase.apache.org/book.html#normalizer) or something?
> 

> Balancer does not split or merge regions. AFAIK, split policy controlled by
> 

> `hbase.regionserver.region.split.policy` does the splitting and there is
> 

> nothing similar for merges.
> 

> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 

> Mallikarjun
> 

> On Mon, Jul 12, 2021 at 2:48 PM Christian Pfarr z0ltrix@pm.me.invalid
> 

> wrote:
> 

> > Hello @all,
> > 

> > i´ve a quesion regarding controlling the number of regions on small tables
> > 

> > in HBase.
> > 

> > But first i have to give you some hints about our Usecase.
> > 

> > We´ve built a lambda architecture with HDFS (Batch), HBase(Speed) and
> > 

> > Drill as Serving Layer where we are combining Parquet Files from HDFS with
> > 

> > HBase Rows that are newer then the most recent Row in HDFS.
> > 

> > The HBase table is filled in realtime via Nifi, while it is cleaned up
> > 

> > every Batch (nightly) so that Drill can put the most workload on HDFS.
> > 

> > Unfortunately the hbase table is very small and because of this, we have
> > 

> > only one region and because of that, drill cannot parallelize the query,
> > 

> > which leads to long query times.
> > 

> > If i pre-split the hbase table everything is fine, until the balancer
> > 

> > comes and merges the small regions. So after a few hours everything is slow
> > 

> > again :-/
> > 

> > So... my question is now, whats the best way to handle these parallization
> > 

> > issue.
> > 

> > I thought about setting hbase.hregion.max.filesize to a very small
> > 

> > number, for example HDFS Blocksize = 128 MB but i´m not shure if this leads
> > 

> > to new problems.
> > 

> > What do you think? Is there a better way to handle this?
> > 

> > Regards,
> > 

> > z0ltrix

Re: Number of Regions with small Tables

Posted by Mallikarjun <ma...@gmail.com>.

Do you have any configuration for Region Normalizer (
https://hbase.apache.org/book.html#normalizer) or something?

Balancer does not split or merge regions. AFAIK, split policy controlled by
`hbase.regionserver.region.split.policy` does the splitting and there is
nothing similar for merges.

---
Mallikarjun


On Mon, Jul 12, 2021 at 2:48 PM Christian Pfarr <z0...@pm.me.invalid>
wrote:

> Hello @all,
>
> i´ve a quesion regarding controlling the number of regions on small tables
> in HBase.
> But first i have to give you some hints about our Usecase.
>
> We´ve built a lambda architecture with HDFS (Batch), HBase(Speed) and
> Drill as Serving Layer where we are combining Parquet Files from HDFS with
> HBase Rows that are newer then the most recent Row in HDFS.
> The HBase table is filled in realtime via Nifi, while it is cleaned up
> every Batch (nightly) so that Drill can put the most workload on HDFS.
> Unfortunately the hbase table is very small and because of this, we have
> only one region and because of that, drill cannot parallelize the query,
> which leads to long query times.
>
> If i pre-split the hbase table everything is fine, until the balancer
> comes and merges the small regions. So after a few hours everything is slow
> again :-/
>
> So... my question is now, whats the best way to handle these parallization
> issue.
> I thought about setting hbase.hregion.max.filesize to a very small
> number, for example HDFS Blocksize = 128 MB but i´m not shure if this leads
> to new problems.
>
> What do you think? Is there a better way to handle this?
>
> Regards,
> z0ltrix
>
>
>