You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "LN (JIRA)" <ji...@apache.org> on 2008/07/15 10:35:32 UTC
[jira] Commented: (HBASE-745) scaling of one regionserver, improving memory and cpu usage

    [ https://issues.apache.org/jira/browse/HBASE-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613555#action_12613555 ] 

LN commented on HBASE-745:
--------------------------

compaction time caculating:
1. suppose we are keep writing data to regionserver, and rowid of data is hashed to all regions.
2. according to default optionalcacheflushinterval(30min) and threshold(3), all HStore will have a memcache flushed storefile in 30min, after 1 hour, each HStore will have 3 storefile(include original 1), so a compaction will taken. that is, all HStore in the regionserver will do a compaction in 1 hour.
3. a compaction of HStore will read all data in mapfiles of the HStore, i'd suppose the time of compcating is depends on total file size of mapfiles. so the whole compacting time(caused by optionalcacheflushinterval) of a regionserver, depends on data size  the regionserver serving.
4. now we can see, the default optionalcacheflushinterval is not suitable for most env., i've found my hardware(Xeon 3.2*2, dualcore, scsi ) can compacting 10M data per second, this mean it can compact 36G in 1 hour, when data size larger than 36G?...
5. how about increasing optionalcacheflushinterval? to 12hours, even 24hours? unfortunatly, i found it useless. because globalMemcacheLimit, it default 512M, when it reached, memcache will flushed(storefile created), until total size of memcache lower than 256M, since inserted rowids are distributed to all regions, nearly half of all regions will have a new storefile too. then when inserted data reach 1G(4 times of flushing global memcache), all data of the regionserver compacted. no setting can adjust this behavor.

> scaling of one regionserver, improving memory and cpu usage
> -----------------------------------------------------------
>
>                 Key: HBASE-745
>                 URL: https://issues.apache.org/jira/browse/HBASE-745
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.3
>         Environment: hadoop 0.17.1
>            Reporter: LN
>            Priority: Minor
>
> after weeks testing hbase 0.1.3 and hadoop(0.16.4, 0.17.1), i found there are many works to do,  before a particular regionserver can handle data about 100G, or even more. i'd share my opions here with stack, and other developers.
> first, the easiest way improving scalability of regionserver is upgrading hardware, use 64bit os and 8G memory for the regionserver process, and speed up disk io. 
> besides hardware, following are software bottlenecks i found in regionserver:
> 1. as data increasing, compaction was eating cpu(with io) times, the total compaction time is basicly linear relative to whole data size, even worse, sometimes square relavtive to that size.
> 2. memory and socket connection usage are depends on opened mapfiles, see HADOOP-2341 and HBASE-24. 
> will explain above in comments later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.