You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2008/08/16 01:39:44 UTC

[jira] Created: (HBASE-834) Upper bound on files we compact at any one time

Upper bound on files we compact at any one time
-----------------------------------------------

                 Key: HBASE-834
                 URL: https://issues.apache.org/jira/browse/HBASE-834
             Project: Hadoop HBase
          Issue Type: Improvement
            Reporter: stack
            Priority: Minor


>From Billy in HBASE-64, which we closed because it got pulled all over the place:

{code}
Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3

I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region

If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.

When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Attachment: 834-patch.txt

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.3.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.3.0
>
>         Attachments: 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Attachment: 834-0.2.1-patchv2.txt

Attached new version with above comments fixed

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.19.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626888#action_12626888 ] 

Billy Pearson commented on HBASE-834:
-------------------------------------

Might check the javadocs etc not sure what I need to do there I made some changes but please review them for me and let me know if I am missing anything.

second thought on the ttl on minor compaction a deleted record will be removed in minor compactions if ttl is expired but the record will 
remain until the major compaction or a compaction that includes the cell to be deleted and it will be deleted then also sense the cell its 
self will have a expired ttl we should not get it in a get/scanner. so I thank it is still ok to leave the ttl code to do its work on minor compaction's.


> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Affects Version/s: 0.3.0
                       0.2.1
               Status: Patch Available  (was: Open)

I thank I got this working correctly I test on my end and it all works ok
Sense we where doing the minor compaction (incremental compaction) on the HStore level I did the same for the Major Compaction
I set the default to 1 day fill free to change that to what you guys thank is a correct default.

{code}
2008-08-15 23:34:37,626 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region -ROOT-,,0
2008-08-15 23:34:37,626 DEBUG org.apache.hadoop.hbase.regionserver.HLog: changing sequence number from 0 to 866762
2008-08-15 23:34:37,634 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 70236052/info
2008-08-15 23:34:37,682 DEBUG org.apache.hadoop.hbase.regionserver.HStore: started compaction of 1 files into /hbase/-ROOT-/compaction.dir/70236052/info/mapfiles/5245590111629292638
2008-08-15 23:34:37,819 DEBUG org.apache.hadoop.hbase.regionserver.HStore: moving /hbase/-ROOT-/compaction.dir/70236052/info/mapfiles/5245590111629292638 to /hbase/-ROOT-/70236052/info/mapfiles/8511958703098935844
2008-08-15 23:34:37,885 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 70236052/info store size is 809.0
2008-08-15 23:34:37,890 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region -ROOT-,,0 in 0sec
{code}


Some of my debug code I removed from the patch outputted the timestamps and folder location of the lowTimestamp files. So I could make sure we where checking the correct folder and the timestamps from the files where in mills and everything showed up correctly in the right format.

Please review and let me know.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.3.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.3.0
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-834:
------------------------

    Summary: 'Major' compactions and upper bound on files we compact at any one time  (was: Upper bound on files we compact at any one time)

Changed the subject.

> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627015#action_12627015 ] 

stack commented on HBASE-834:
-----------------------------

Just what the doctor ordered.  I agree w/ your expiration reasoning.  Let me do some testing.  Will get back to you.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623070#action_12623070 ] 

Billy Pearson commented on HBASE-834:
-------------------------------------

HBASE-745 solved the minor compaction with incremental compaction and it still 
can do major compaction's sometimes but not often.

The only downside to HBASE-745 is it does not guarantee a major compaction to ever happen of the old larger files. 
We do have an option to call the compaction with forced set to true and skip the minor compaction.

Suggestion to complete the major compaction part

1. Add a function in HRegion to return the oldest file timestamp of when it was created  something like HRegion.getOldestHStoreTimestamp()
2. Add a option (hbase.hregion.majorcompaction) in the hbase-default.xml setting to make major compaction's to happen every X secs say default 1 per day or a week .
3.  Compare hbase-default.xml against the oldest timestamp in HStore.compact and change from force(false) to force(true) when needed but not in reverse. 

If someone could help with the HRegion.getOldestHStoreTimestamp() function or point me in the right direct on how to do that in hadoop. 
I thank I could come up with a patch to give us a major compaction and add a limit on the number of regions to compact at one time while we are doing the minor compaction.

Anything I am missing here stack?

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Minor
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson reopened HBASE-834:
---------------------------------


Got a chance to run some test and Found max files to compaction at one time not working correctly.

> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625125#action_12625125 ] 

Billy Pearson commented on HBASE-834:
-------------------------------------

wait on patch there seams to be a logic error somewhere let me run some more test I am seeing major compactions more often then I should be and not when they should be.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.19.0
>
>         Attachments: 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623170#action_12623170 ] 

stack commented on HBASE-834:
-----------------------------

Patch looks good Billy.  Thanks. I want to test on cluster before applying.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.3.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.3.0
>
>         Attachments: 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Attachment: 834.patchv4-trunk.txt

834.patchv4-trunk.txt
Has just the changes to make it work correctly on trunk

> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-0.2.1-patchv4.txt, 834-patch.txt, 834.patchv4-trunk.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-834:
------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I did a bunch of testing.  Was able to load a table, delete, then refill into table of same name and schema three times which is much better than I could do previously.  But then on the fourth time when I go to load a table, when I check the .META. table, there are old historian edits showing.   I'm hoping this is HBASE-855.  Will retest when HBASE-855 has a patch.

> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson reassigned HBASE-834:
-----------------------------------

    Assignee: Billy Pearson

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.3.0
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626894#action_12626894 ] 

Billy Pearson commented on HBASE-834:
-------------------------------------

If everything looks good let me know and I will make up a patch that will apply to 0.18.0 also.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Fix Version/s: 0.3.0
                   0.2.1

Changing this to assign to 2.1 and 3.0 

Just noticed we now have a problem of never removing data from (deletes,ttl,max_version) from mapfiles If we never compact all the mapfiles at some point.
Currently the only way we do is after a split or if the mapfile sizes are just right to include all the mapfile in the incremental compaction.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Minor
>             Fix For: 0.2.1, 0.3.0
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-834:
--------------------------------

    Fix Version/s:     (was: 0.19.0)
                   0.18.0
         Priority: Blocker  (was: Minor)

Change to blocker. Also move from 0.19 to 0.18

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623090#action_12623090 ] 

Billy Pearson commented on HBASE-834:
-------------------------------------

Forgot this patch also includes the max files to compact at one time on a minor compaction I set the default to 10.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.3.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.3.0
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626850#action_12626850 ] 

stack commented on HBASE-834:
-----------------------------

Patch looks good Billy.  I haven't tested it because after banging my head against hbase-826, I've learned that this notion of major compaction is a bit more involved than I at first thought (I think you may have known all along how important the difference between minor and major is).

Here is what I learned.  While compacting, if we overrun max versions or a cell has expired, we do not let the cell go through to the compacted file.  That was fine in the old days, when we always compacted everything.  Since we got smarter compacting -- i.e. minor compactions only compacting the small files -- this behavior can make for malignant results (See towards end of hbase-826 for an illustration).

So, Billy, you need to add passing of the 'force' flag down into the HStore#compact (We should probably rename 'force' as 'majorCompaction' or something?).  Then in HStore#compact, we only do the max versions and expiration code IF its a major compaction.  Otherwise, we just let ALL cells go through to the compacted files (At runtime, the get and scan respect max versions and expiration times).

I'll be on IRC tomorrow if you want to chat more on this Billy or just write notes into this JIRA and we can back and forth here (If you want, post a rough patch and I can give feedback -- that might be best).

Oh, one other thing, there should be no maximum on the amount of files to compact at a time when doing a major compaction, but I think the way your patch is written, there isn't; its only when minor compactions run that there is a limit -- is that so?

Thanks.



> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625640#action_12625640 ] 

stack commented on HBASE-834:
-----------------------------

Oh, one other thing... does the test of whether to do a major compaction have to happen inside the synchronize on this.storefiles?  i.e. ' 742       synchronized (storefiles) {'  Can it be done outside of this block?

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.19.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Attachment: 834-0.2.1-patch.txt

This is a working patch for 0.2.1 on of my if statments was wrong correct in this patch and added a little more debug logging to show hours sense the last major compaction.
I would like to see this go in to 0.18.0 verson also so we do not need to patch for .018.0 and 0.19.0
I believe this will apply to current trunk if you guys want to include it.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.19.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627272#action_12627272 ] 

stack commented on HBASE-834:
-----------------------------

Ok. Thanks Billy.  Applied the patch.  Looked harmless; just a check of >= rather than >.

> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-0.2.1-patchv4.txt, 834-patch.txt, 834.patchv4-trunk.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-834:
------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

FYI Billy, better to just open new issue rather than reopen an old.  Reresolving this one.

> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-0.2.1-patchv4.txt, 834-patch.txt, 834.patchv4-trunk.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627162#action_12627162 ] 

stack commented on HBASE-834:
-----------------------------

Oh, I applied this patch to branch and trunk.  Fixed comments where it talked about the 'force' paramter instead of the new 'majorCompaction' parameter.  The patch failed going against TRUNK but the hunks that didn't go in, we don't want anyway.  Just left them out.  Thanks for the patch Billy.

> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625638#action_12625638 ] 

stack commented on HBASE-834:
-----------------------------

I tried the patch.  Here's a filtered extract from the logs that just shows the new messages around the major compaction test:

{code}
2008-08-26 04:31:02,419 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
2008-08-26 04:31:02,420 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/historian
2008-08-26 04:31:02,877 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0
2008-08-26 04:31:02,878 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/info
2008-08-26 04:31:08,588 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
2008-08-26 04:31:08,588 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 652690253/info
2008-08-26 04:31:37,237 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
2008-08-26 04:31:37,237 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1648250611/info
2008-08-26 04:31:59,721 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 
2008-08-26 04:31:59,721 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 433766857/info
2008-08-26 04:32:23,407 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
2008-08-26 04:32:23,407 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 532635319/info
2008-08-26 04:32:40,876 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
2008-08-26 04:32:40,876 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 478968074/info
2008-08-26 04:33:16,252 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
2008-08-26 04:33:16,252 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 305918941/info
2008-08-26 04:33:32,483 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 
2008-08-26 04:33:32,483 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 629593107/info
2008-08-26 04:49:28,735 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
2008-08-26 04:49:28,735 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/historian
2008-08-26 04:49:29,218 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
2008-08-26 04:49:29,218 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/info
2008-08-26 04:57:20,395 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
2008-08-26 04:57:32,731 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
2008-08-26 04:58:23,362 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
2008-08-26 04:58:23,362 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1720995599/info
2008-08-26 04:58:44,441 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
2008-08-26 04:58:56,754 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
2008-08-26 04:59:41,982 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
...
{code}

We seem to be major compacting too much -- even though I'd set major compaction time down to 30 minutes instead of 24 so I could test (Its probably this test 'if (lowTimestamp < System.currentTimeMillis() - majorCompactionTime){
' -- if lowTimestamp is zero, then we'll major compact).

We probably shouldn't log if we're returning a zero out of getLowTimestamp method.

Would also suggest that getLowTimestamp be renamed getLowestTimestamp and moved into HStore from HRegion since its only used there (make it private too?).

Did you mean to do the below in HRegion Billy?
{code}
@@ -867,7 +889,7 @@
    * @throws IOException
    */
   public byte [] compactStores() throws IOException {
-    return compactStores(false);
+	  return compactStores(false);

{code}

Make the above fixes and I'll try it again Billy.  We need this patch.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.19.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Attachment: 834-0.2.1-patchv4.txt

834-0.2.1-patchv4.txt
Has just the changes to make it work correctly branch 0.2


> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-0.2.1-patchv4.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) 'Major' compactions and upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Status: Patch Available  (was: Reopened)

> 'Major' compactions and upper bound on files we compact at any one time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-0.2.1-patchv4.txt, 834-patch.txt, 834.patchv4-trunk.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-834:
--------------------------------

    Fix Version/s:     (was: 0.18.0)
                   0.19.0

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.2.1, 0.19.0
>
>         Attachments: 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HBASE-834:
--------------------------------

    Attachment: 834-0.2.1-patchv3.txt

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-834) Upper bound on files we compact at any one time

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626873#action_12626873 ] 

Billy Pearson commented on HBASE-834:
-------------------------------------

ok latest version of patch
834-0.2.1-patchv3.txt

Changed force to majorCompaction better name.

Passed majorCompaction down to compactHStoreFiles function so we could know not to remove > max version on a minor compaction / incremental compaction
This should solve the problem with HBASE-826.
Stack said we do not need to max versions or expiration code on a minor compaction but I left the expiration code the same 
because if the data is passed its ttl it will not matter if its a minor or a major compaction from what I can reason but I might be wrong let me know if that needs to be changed also.
So as of now it will be removed so we do not have to read it again and remove it on the next compaction = less work later theory.

looks like there is a bug in the rest/Dispatcher.java file current branch-0.2 will not compile clean but I thank my patch will build clean if that error is fixed.

Yes the minor compaction has the limit of hbase.hstore.compaction.max
and majorcompactions do not have this limit.

> Upper bound on files we compact at any one time
> -----------------------------------------------
>
>                 Key: HBASE-834
>                 URL: https://issues.apache.org/jira/browse/HBASE-834
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.2.1, 0.18.0
>            Reporter: stack
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 834-0.2.1-patch.txt, 834-0.2.1-patchv2.txt, 834-0.2.1-patchv3.txt, 834-patch.txt
>
>
> From Billy in HBASE-64, which we closed because it got pulled all over the place:
> {code}
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.