You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Monish r <mo...@gmail.com> on 2012/09/23 15:59:21 UTC

Clarification regarding major compaction logic

Hi guys,

i would like to clarify the following regarding Major Compaction

1) When TTL is set for a column family and major compaction is triggered by
user

- Does it act on the region only when *time since last major compaction is
> TTL.*
*
*

2) Does major compaction go through the index of a region to find out that
there is data to be acted upon and then start the rewriting  ( or ) does it
rewrite without any pre checks about the data  inside the region ?

3) If major compaction for a region results in a empty region , does the
empty region get deleted or left as such ?

Regards,
R.Monish

RE: Clarification regarding major compaction logic

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
Reply Inline

> -----Original Message-----
> From: Monish r [mailto:monishsvce@gmail.com]
> Sent: Sunday, September 23, 2012 7:29 PM
> To: user@hbase.apache.org
> Subject: Clarification regarding major compaction logic
> 
> Hi guys,
> 
> i would like to clarify the following regarding Major Compaction
> 
> 1) When TTL is set for a column family and major compaction is
> triggered by
> user
> 
> - Does it act on the region only when *time since last major compaction
> is
> > TTL.*
> *
> *
[Ram] Major compaction can be triggered based on configuration or manually.
By default major compaction gets triggered every 24 hrs.
While doing compaction(minor or major) if the compaction algo finds that
there are HFiles for which TTL has expired major compaction will simply
delete those files.
Similarly while doing compaction (minor or major) the KV in every HFile is
scanned and if the KV is found to be TTL expired then it is avoided from
getting written to the new compacted file.

> 
> 2) Does major compaction go through the index of a region to find out
> that
> there is data to be acted upon and then start the rewriting  ( or )
> does it
> rewrite without any pre checks about the data  inside the region ?
[Ram] Major compaction differs from minor compaction in a way that the
delete markers are removed.  So if once Major compaction is triggered, the
algo finds if there are any files
That can be major compacted and just runs over those files.  
> 
> 3) If major compaction for a region results in a empty region , does
> the
> empty region get deleted or left as such ?
> 
[Ram] If I remember correctly if major compaction or even minor compaction
results in no data, still an empty file is flushed.  So the region remains
intact and the region is never deleted.
Hope this helps.
> Regards,
> R.Monish


Re: Clarification regarding major compaction logic

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there, for background on the file selection algorithm for compactions,
see...

http://hbase.apache.org/book.html#regions.arch

9.7.5.5. Compaction






On 9/23/12 9:59 AM, "Monish r" <mo...@gmail.com> wrote:

>Hi guys,
>
>i would like to clarify the following regarding Major Compaction
>
>1) When TTL is set for a column family and major compaction is triggered
>by
>user
>
>- Does it act on the region only when *time since last major compaction is
>> TTL.*
>*
>*
>
>2) Does major compaction go through the index of a region to find out that
>there is data to be acted upon and then start the rewriting  ( or ) does
>it
>rewrite without any pre checks about the data  inside the region ?
>
>3) If major compaction for a region results in a empty region , does the
>empty region get deleted or left as such ?
>
>Regards,
>R.Monish