You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michał Podsiadłowski <po...@gmail.com> on 2010/03/31 12:28:46 UTC

Region spiting, compaction and merging

Hi hbase fans

We started our cluster (Hbase trunk + CHD3 with hbase dedicated
patches)  on production environment and we left it running now for 2
days. Everything is working nice but we didn't try to brake it yet as
we did previously ;)
Still there are few things that concerns me.
We have one table where there is only few rows - around 200 x few tens
of KB which is updates quite frequently - all records few times an
hour - sounds trivial but it's keep growing and splitting.
Currently after 2 days there are 177 records kept in 4 regions what
IMHO is not good. I had to run manually major compaction to get rid of
invalidated data (from around 500MB to 0MB and few in memStore
according to UI).
As far as can see in the logs there were no major compactions since we
started 2 days ago. Question is - it it normal that tables grows so
quickly and due to being stuffed with garbage they are spited?
Secondly is there a way to force hbase to perform major compaction at
some particular period - i.e 5 a.m, so it doesn't generate unnecessary
load during hot periods like in the evening where there is a strong
demand for performance? Or maybe I am exaggerating the problem and
influence on the whole system is negligible?

As third is there a way to merge split regions? As far as i can see
there is https://issues.apache.org/jira/browse/HBASE-420 which is
minor issue.

Cheers,
Michal

Re: Region spiting, compaction and merging

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Hey Michal!

Currently there's no tool you can use except cron, you can request a
major compaction on a table by doing something like: echo
"major_compact 'some_table'" | /path/to/hbase/bin/hbase shell

You can merge regions using the merge tool but it must be run while
HBase is down. You can run it like that: bin/hbase
org.apache.hadoop.hbase.util.Merge

Enabling compression on that table will allow it to stay small, use
LZO (see the wiki).

J-D

2010/3/31 Michał Podsiadłowski <po...@gmail.com>:
> Hi hbase fans
>
> We started our cluster (Hbase trunk + CHD3 with hbase dedicated
> patches)  on production environment and we left it running now for 2
> days. Everything is working nice but we didn't try to brake it yet as
> we did previously ;)
> Still there are few things that concerns me.
> We have one table where there is only few rows - around 200 x few tens
> of KB which is updates quite frequently - all records few times an
> hour - sounds trivial but it's keep growing and splitting.
> Currently after 2 days there are 177 records kept in 4 regions what
> IMHO is not good. I had to run manually major compaction to get rid of
> invalidated data (from around 500MB to 0MB and few in memStore
> according to UI).
> As far as can see in the logs there were no major compactions since we
> started 2 days ago. Question is - it it normal that tables grows so
> quickly and due to being stuffed with garbage they are spited?
> Secondly is there a way to force hbase to perform major compaction at
> some particular period - i.e 5 a.m, so it doesn't generate unnecessary
> load during hot periods like in the evening where there is a strong
> demand for performance? Or maybe I am exaggerating the problem and
> influence on the whole system is negligible?
>
> As third is there a way to merge split regions? As far as i can see
> there is https://issues.apache.org/jira/browse/HBASE-420 which is
> minor issue.
>
> Cheers,
> Michal
>