You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org> on 2010/11/09 23:42:22 UTC

[jira] Created: (HBASE-3209) New Compaction Heuristic

New Compaction Heuristic
------------------------

                 Key: HBASE-3209
                 URL: https://issues.apache.org/jira/browse/HBASE-3209
             Project: HBase
          Issue Type: Improvement
            Reporter: Nicolas Spiegelberg
            Assignee: Nicolas Spiegelberg


We have a whole bunch of compaction awesome in our internal 0.89 branch.  Porting this to 0.90:

1) don't unconditionally compact 4 files. have a min threshold
2) intelligently upgrade minors to majors
3) new compaction algo (derived in HBASE-2462 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3209) New Compaction Heuristic

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930376#action_12930376 ] 

HBase Review Board commented on HBASE-3209:
-------------------------------------------

Message from: stack@duboce.net

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1192/#review1881
-----------------------------------------------------------

Ship it!


k... let me commit.


trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.cloudera.org/r/1192/#comment6110>

    Whitespace here -- I can fix on commit.



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.cloudera.org/r/1192/#comment6113>

    Very cute


- stack





> New Compaction Heuristic
> ------------------------
>
>                 Key: HBASE-3209
>                 URL: https://issues.apache.org/jira/browse/HBASE-3209
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>
> We have a whole bunch of compaction awesome in our internal 0.89 branch.  Porting this to 0.90:
> 1) don't unconditionally compact 4 files. have a min threshold
> 2) intelligently upgrade minors to majors
> 3) new compaction algo (derived in HBASE-2462 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3209) New Compaction Heuristic

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930332#action_12930332 ] 

Nicolas Spiegelberg commented on HBASE-3209:
--------------------------------------------

Results from our cluster:

Multiput Latency: 25 ms => 3 ms avg
Sync Latency:  12 ms => 1.5 ms avg
Compaction Queue: 3 => 0.08 avg
Compaction Time: now 1-30 sec  (note: our compaction time ods chart is off, looked manually at logs)

Read Latency: 6 ms => 9 ms
Files / Store: 2 => 2.6

Note that the minor Read drop can be fixed by setting compactionThreshold from 3 to 2.  We just didn't need the improvement

> New Compaction Heuristic
> ------------------------
>
>                 Key: HBASE-3209
>                 URL: https://issues.apache.org/jira/browse/HBASE-3209
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>
> We have a whole bunch of compaction awesome in our internal 0.89 branch.  Porting this to 0.90:
> 1) don't unconditionally compact 4 files. have a min threshold
> 2) intelligently upgrade minors to majors
> 3) new compaction algo (derived in HBASE-2462 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-3209) New Compaction Heuristic

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3209.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.90.0
     Hadoop Flags: [Reviewed]

Thanks for the patch Nicolas.  Committed.

> New Compaction Heuristic
> ------------------------
>
>                 Key: HBASE-3209
>                 URL: https://issues.apache.org/jira/browse/HBASE-3209
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.90.0
>
>
> We have a whole bunch of compaction awesome in our internal 0.89 branch.  Porting this to 0.90:
> 1) don't unconditionally compact 4 files. have a min threshold
> 2) intelligently upgrade minors to majors
> 3) new compaction algo (derived in HBASE-2462 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3209) New Compaction Heuristic

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930352#action_12930352 ] 

HBase Review Board commented on HBASE-3209:
-------------------------------------------

Message from: "Nicolas" <ns...@facebook.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1192/#review1878
-----------------------------------------------------------



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.cloudera.org/r/1192/#comment6095>

    I know.  I'm a whitespace ASBO.  sorry :(



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.cloudera.org/r/1192/#comment6092>

    we can work on fine-tuning this.  +50% was a safe cover for us and worked fine because we pre-split regions.  Really, you probably don't need the pad because the summation algorithm will operate on at least 3 storefiles, so (assuming roughly even flush sizes) you get +100% pad from that.



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.cloudera.org/r/1192/#comment6091>

    s/are meet/have met/


- Nicolas





> New Compaction Heuristic
> ------------------------
>
>                 Key: HBASE-3209
>                 URL: https://issues.apache.org/jira/browse/HBASE-3209
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>
> We have a whole bunch of compaction awesome in our internal 0.89 branch.  Porting this to 0.90:
> 1) don't unconditionally compact 4 files. have a min threshold
> 2) intelligently upgrade minors to majors
> 3) new compaction algo (derived in HBASE-2462 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3209) New Compaction Heuristic

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930347#action_12930347 ] 

HBase Review Board commented on HBASE-3209:
-------------------------------------------

Message from: "Nicolas" <ns...@facebook.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1192/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

We have a whole bunch of compaction awesome in our internal 0.89 branch. Porting this to 0.90:

1) don't unconditionally compact 4 files. have a min threshold
2) intelligently upgrade minors to majors
3) new compaction algo (derived in HBASE-2462 )


This addresses bug HBASE-3209.
    http://issues.apache.org/jira/browse/HBASE-3209


Diffs
-----

  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1033278 

Diff: http://review.cloudera.org/r/1192/diff


Testing
-------

Has been running on our primary cluster for the past couple weeks.


Thanks,

Nicolas




> New Compaction Heuristic
> ------------------------
>
>                 Key: HBASE-3209
>                 URL: https://issues.apache.org/jira/browse/HBASE-3209
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>
> We have a whole bunch of compaction awesome in our internal 0.89 branch.  Porting this to 0.90:
> 1) don't unconditionally compact 4 files. have a min threshold
> 2) intelligently upgrade minors to majors
> 3) new compaction algo (derived in HBASE-2462 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3209) New Compaction Heuristic

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930363#action_12930363 ] 

Nicolas Spiegelberg commented on HBASE-3209:
--------------------------------------------

St^Ack_: nspiegelberg: what config. would I set so it favored less files and kept the old read performance?
[3:36pm] nspiegelberg: you have 2 options
[3:37pm] nspiegelberg: 1) set compactionThreshold == 2
[3:38pm] nspiegelberg: 2) make minCompactSize configurable and set it high
[3:39pm] nspiegelberg: basically, before this algo, we would unconditionally compact 4 files, but the compactionThreshold == 3
[3:40pm] nspiegelberg: this means that we would never use the compaction algorithm unless our cluster was stressed out
[3:41pm] jdcryans: it used to not be like that tho
[3:42pm] jdcryans: it's a hack that we compact everything
[3:42pm] nspiegelberg: the only downside to the current algorithm is that sum(storefiles) doesn't take into account dedupe can have a snowball effect of compacting too aggressively during load.  this can be migitated by lowering hbase.hstore.compaction.max
[3:43pm] nspiegelberg: in reality, this hasn't proved to be an issue for us.  lowering the max compact files will fix it.  we can also add on some simple dedupe heuristics to fix this issue

> New Compaction Heuristic
> ------------------------
>
>                 Key: HBASE-3209
>                 URL: https://issues.apache.org/jira/browse/HBASE-3209
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>
> We have a whole bunch of compaction awesome in our internal 0.89 branch.  Porting this to 0.90:
> 1) don't unconditionally compact 4 files. have a min threshold
> 2) intelligently upgrade minors to majors
> 3) new compaction algo (derived in HBASE-2462 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.