You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org> on 2011/05/11 23:08:47 UTC

[jira] [Created] (HBASE-3877) Determine Proper Defaults for Compaction ThreadPools

Determine Proper Defaults for Compaction ThreadPools
----------------------------------------------------

                 Key: HBASE-3877
                 URL: https://issues.apache.org/jira/browse/HBASE-3877
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
    Affects Versions: 0.92.0
            Reporter: Nicolas Spiegelberg
            Assignee: Nicolas Spiegelberg
            Priority: Trivial


With the introduction of HBASE-1476, we now have multithreaded compactions + 2 different ThreadPools for large and small compactions.  However, this is disabled by default until we can determine a proper default throttle point.  Opening this JIRA to log all discussion on how to select a good default for this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3877) Determine Proper Defaults for Compaction ThreadPools

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032086#comment-13032086 ] 

Nicolas Spiegelberg commented on HBASE-3877:
--------------------------------------------

For some data point.  In our cluster, we do not automatically split regions and keep our region count low.  Therefore, we have StoreFiles that reach in the 10GB range.  Obviously, if all the compaction threads were processing a 10GB compaction, the queue would get stopped up.  We put the throttle point at 500MB.  Since compactions are network-bound.  We have 1Gbps network links & are seeing roughly 40MBps speed (3x == 1Gbps), so about 12 sec per compaction max on the small threadpool.  Therefore, our use case doesn't directly correspond to the common auto-split use case.

My original thought is to default the throttle to:
{code}
min("hbase.hregion.memstore.flush.size" * 2, "hbase.hregion.max.filesize" / 2)
{code}
Note that the default split/flush ratio is 4, so this number should be in the middle.  Since most users do compression, the actual flush size should be ~20% of the MemStore size (so flushSize*2 is really more like flushSize*10).  I will submit a patch with this default.  Please feel free to chime in with your experience using it and we'll see if we can improve this default.

> Determine Proper Defaults for Compaction ThreadPools
> ----------------------------------------------------
>
>                 Key: HBASE-3877
>                 URL: https://issues.apache.org/jira/browse/HBASE-3877
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Trivial
>              Labels: compaction
>
> With the introduction of HBASE-1476, we now have multithreaded compactions + 2 different ThreadPools for large and small compactions.  However, this is disabled by default until we can determine a proper default throttle point.  Opening this JIRA to log all discussion on how to select a good default for this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3877) Determine Proper Defaults for Compaction ThreadPools

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033243#comment-13033243 ] 

Hudson commented on HBASE-3877:
-------------------------------

Integrated in HBase-TRUNK #1918 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1918/])
    

> Determine Proper Defaults for Compaction ThreadPools
> ----------------------------------------------------
>
>                 Key: HBASE-3877
>                 URL: https://issues.apache.org/jira/browse/HBASE-3877
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Trivial
>              Labels: compaction
>
> With the introduction of HBASE-1476, we now have multithreaded compactions + 2 different ThreadPools for large and small compactions.  However, this is disabled by default until we can determine a proper default throttle point.  Opening this JIRA to log all discussion on how to select a good default for this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3877) Determine Proper Defaults for Compaction ThreadPools

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl resolved HBASE-3877.
----------------------------------

    Resolution: Fixed

Looks like this got committed. Marking it such.
                
> Determine Proper Defaults for Compaction ThreadPools
> ----------------------------------------------------
>
>                 Key: HBASE-3877
>                 URL: https://issues.apache.org/jira/browse/HBASE-3877
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Trivial
>              Labels: compaction
>
> With the introduction of HBASE-1476, we now have multithreaded compactions + 2 different ThreadPools for large and small compactions.  However, this is disabled by default until we can determine a proper default throttle point.  Opening this JIRA to log all discussion on how to select a good default for this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3877) Determine Proper Defaults for Compaction ThreadPools

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276346#comment-13276346 ] 

Hudson commented on HBASE-3877:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #5 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/5/])
    [HBASE-5867] Improve Compaction Throttle Default

Summary:
We recently had a production issue where our compactions fell
behind because our compaction throttle was improperly tuned and
accidentally upgraded all compactions to the large pool. The default
from HBASE-3877 makes 1 bad assumption: the default number of flushed
files in a compaction. MinFilesToCompact should be taken into
consideration. As a default, it is less damaging for the large thread
to be slightly higher than it needs to be and only get timed-majors
versus having everything accidentally promoted.

Test Plan:  - mvn test

Reviewers: JIRA, Kannan, Liyin
Reviewed By: Kannan
CC: stack

Differential Revision: https://reviews.facebook.net/D2943 (Revision 1338809)

     Result = FAILURE
nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Determine Proper Defaults for Compaction ThreadPools
> ----------------------------------------------------
>
>                 Key: HBASE-3877
>                 URL: https://issues.apache.org/jira/browse/HBASE-3877
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Trivial
>              Labels: compaction
>
> With the introduction of HBASE-1476, we now have multithreaded compactions + 2 different ThreadPools for large and small compactions.  However, this is disabled by default until we can determine a proper default throttle point.  Opening this JIRA to log all discussion on how to select a good default for this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira