You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Rong-En Fan (JIRA)" <ji...@apache.org> on 2008/10/20 04:26:44 UTC

[jira] Created: (HBASE-938) major compaction period is not checked periodically

major compaction period is not checked periodically
---------------------------------------------------

                 Key: HBASE-938
                 URL: https://issues.apache.org/jira/browse/HBASE-938
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.18.0, 0.18.1
         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
            Reporter: Rong-En Fan
            Priority: Minor


The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-938:
------------------------

    Attachment: 938-v7.patch

Now you'll see messages like this if we try to do major compaction on a file that has already been major compacted:

{code}
2008-11-15 00:19:30,441 [regionserver/0:0:0:0:0:0:0:0:60020.compactor] DEBUG org.apache.hadoop.hbase.regionserver.HStore: Skipping major compaction because one major compacted file only and elapsedTime 1250119 is < ttl -1
{code}

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938-v4.patch, 938-v6.patch, 938-v7.patch, 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "Rong-En Fan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640928#action_12640928 ] 

Rong-En Fan commented on HBASE-938:
-----------------------------------

After few hours, all regions got a major compaction.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Priority: Minor
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-938:
------------------------

    Attachment: 938-v6.patch

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938-v4.patch, 938-v6.patch, 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "Rong-En Fan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640901#action_12640901 ] 

Rong-En Fan commented on HBASE-938:
-----------------------------------

In a ~400 regions cluster, I see 140 major compaction within 40 mins after I started the cluster, and this is still on-going. 

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Priority: Minor
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646809#action_12646809 ] 

Billy Pearson commented on HBASE-938:
-------------------------------------

default I set my Major Compaction to once a week but on topic we need major compaction to run sometime even if no updates so we will able to remove expired ttl data in a timely fashion. not sure if timely = daily guess that depends on your setup and data.

Guess we need to decide whats acceptable ttl of expired ttl data.


> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-938:
------------------------

    Fix Version/s: 0.19.0
         Priority: Critical  (was: Minor)

Marking critical fix for 0.19.  Major compactions are expensive, especially in clusters that tend toward the large.  Seems an easy fix persisting last major compaction time so doesn't happen on every restart.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Priority: Critical
>             Fix For: 0.19.0
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644483#action_12644483 ] 

Billy Pearson commented on HBASE-938:
-------------------------------------

Looking at trunk my idea above will not work because we do not have the optional flush any more.

But yes the major compaction is based off the oldest timestamps from the mapfile's on the store level.
So restarts do not interfere with the Major compactions but will queue up over due Major Compactions because of the open.


> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Priority: Critical
>             Fix For: 0.19.0
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-938:
------------------------

    Attachment: 938-v4.patch

This patch adds whether or not its a major compaction to the info file.  Then, when compacting, if major, will not do another major compaction if last compaction was one (or if time since last major compaction is < ttl).

Testing now.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938-v4.patch, 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-938.
-------------------------

    Resolution: Fixed

Committed.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938-v4.patch, 938-v6.patch, 938-v7.patch, 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646940#action_12646940 ] 

Billy Pearson commented on HBASE-938:
-------------------------------------

I like that idea sounds good to me. 
If we can write it in to the meta data then if only one file there and major compaction do as you said above
that would make the major compaction much smarter and save on cpu and bandwidth.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-938:
------------------------

    Attachment: major.patch

Patch to add a thread that checks for major compactions.   Still in need of startup.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647141#action_12647141 ] 

stack commented on HBASE-938:
-----------------------------

Looking at this, best for now is writing the fact that the HStoreFile is result of major compaction into the HSF info file.   When we change file formats, we'll clean all this up but this should work for now.  Then, yeah, when compacting, if only one file and if we're doing a major compaction, and if < column family TTL has passed, don't do a new major compaction.  Should save a bunch of CPU/nio.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646807#action_12646807 ] 

stack commented on HBASE-938:
-----------------------------

Patch seems to be working but it introduces a new issue in that it ensures that we check for major compaction every 24 hours and that a major compation will run every 24 hours, even if no flushes have come through meantime.   We don't want all hbase data rewritten every 24 hours.

Will try and fix as part of this issue.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-938:
------------------------

    Attachment: 938.patch

More complete patch.  Testing now.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-938:
---------------------------

    Assignee: stack

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646812#action_12646812 ] 

stack commented on HBASE-938:
-----------------------------

As is, if we did a major compaction at the start of the week and then if no updates during the whole week, at the end of the next week, we'll rewrite an already major compacted file.  I suppose even if no updates, ttls could have expired. Otherwise, Its a waste of CPU and network bandwidth.

You think we should up the major compaction time default to be a week?

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646923#action_12646923 ] 

stack commented on HBASE-938:
-----------------------------

I will up the period, make it two days at least.

But was thinking too that we should mark files that have been major compacted -- write the fact into the files metadata or into the file name -- and before running another, check the column descriptor to see if TTL is forever; if it is, do not run another major compaction if only one file to compact.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646820#action_12646820 ] 

Billy Pearson commented on HBASE-938:
-------------------------------------

On a small cluster with little data it would not not matter so much but once someone got any real amount of data I would thank once a day it way to much for a default.
So once a week I thank will be more of the norm as data sets get more data.


> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644445#action_12644445 ] 

stack commented on HBASE-938:
-----------------------------

I need to look into this but I just noticed how Billy made sure we never compact more than often than the major compaction period, even if restarts in between:

{code}
        long lowTimestamp = getLowestTimestamp(fs, mapdir);
        lastMajorCompaction = System.currentTimeMillis() - lowTimestamp;
        if (lowTimestamp < (System.currentTimeMillis() - majorCompactionTime) &&
            lowTimestamp > 0l) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("Major compaction triggered on store: " +
              this.storeNameStr + ". Time since last major compaction: " +
              ((System.currentTimeMillis() - lowTimestamp)/1000) + " seconds");
          }
...
{code}

We look at file timestamps and if oldest file was written > major compaction period ago, then we run major compaction next time we check compactions.

So, for this issue, I need to confirm we only compact on a period and two, since optional flush was removed when we implemented appends, needs to be a thread or something that runs on a period looking to see if any regions in needs of major compaction.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Priority: Critical
>             Fix For: 0.19.0
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-938) major compaction period is not checked periodically

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-938:
------------------------

    Fix Version/s:     (was: 0.19.0)
                   0.18.2

Adding to 0.18.2 because of discussion up on list.

> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: 938-v4.patch, 938-v6.patch, 938-v7.patch, 938.patch, major.patch
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-938) major compaction period is not checked periodically

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641150#action_12641150 ] 

Billy Pearson commented on HBASE-938:
-------------------------------------

Major compaction check is done in the compaction check so it should be getting checked when there is a memcache flush,open,or a splt.
There should not be major compaction triggered for all region on a restart that's not how the code is written to do the major compactions.

If a table gets no update to trigger a compaction check then the stale (over ttl and max_versions) data never gets removed form the table.
I have seen this happen on a idle table with no updates over a long time.

What we should be doing I thank is on the optional flush is queue up a compaction check where there is something to flush or not that way if a major compaction is needed it will run with in the optional flush time setting
This would allow us to check the hbase.hregion.majorcompaction more periodically on tables with little updates.
The way the code is now If there is no minor or major compaction needed then it will do do nothing so costing no extra resources to check if a major compaction is needed.


> major compaction period is not checked periodically
> ---------------------------------------------------
>
>                 Key: HBASE-938
>                 URL: https://issues.apache.org/jira/browse/HBASE-938
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.0, 0.18.1
>         Environment: HBase 0.18 branch (should be RC1) + Hadoop 0.18 branch
>            Reporter: Rong-En Fan
>            Priority: Critical
>             Fix For: 0.19.0
>
>
> The major compaction period, hbase.hregion.majorcompaction, is not checked periodically. Currently, we only request major compaction when the region is open or split at which point we check whether the major compaction period is due.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.