You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2012/12/12 21:19:39 UTC

How to pick which region(s) to major compact?

Hi,

If you want to do major compaction on a single region at a time (to
minimize the impact on the cluster), how do you pick which region to
compact?

What should one look for in order to get the best ROI out of major
compaction - the best ratio of the negative impact and positive benefit
- and is there a programmatic way to get to this information, so region
selection+compaction can be automated?

Thanks,
Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

Re: How to pick which region(s) to major compact?

Posted by Bryan Beaudreault <bb...@hubspot.com>.
I've looked into this in the past, and I haven't implemented anything yet.
 But I have a couple notes:

1) From what I can tell hbase doesn't currently provide you with an API you
could use to figure this out smartly. (I was looking at 0.90.x, it could
have changed in later versions).

2) What seemed to me to be a good way to do it was to do it based on a
combination of oldest modified time and number of store files.  I was going
to write a script which iterates all the regions in HDFS, chooses the
region (or up to N regions) which had either the most files or the files
with the oldest modified timestamp, and major_compact those.

3) At the end of the day, our servers were not really utilizing 100% of
disk and CPU so we decided to just major compact everything each night.  We
staggered the compactions over a couple hours so as not to overwhelm, but
not sure if that has much effect since it is in serial in a single thread
anyway.


On Wed, Dec 12, 2012 at 3:19 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> If you want to do major compaction on a single region at a time (to
> minimize the impact on the cluster), how do you pick which region to
> compact?
>
> What should one look for in order to get the best ROI out of major
> compaction - the best ratio of the negative impact and positive benefit
> - and is there a programmatic way to get to this information, so region
> selection+compaction can be automated?
>
> Thanks,
> Otis
> --
> HBASE Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>