You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Danil Lipovoy (Jira)" <ji...@apache.org> on 2020/06/02 05:52:00 UTC
[jira] [Commented] (HBASE-23887) BlockCache performance improve by reduce eviction rate

    [ https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123362#comment-17123362 ] 

Danil Lipovoy commented on HBASE-23887:
---------------------------------------

Did more tests with the same tables, but in this time _recordcount_ = count of records in the table and

*hbase.lru.cache.heavy.eviction.count.limit* = 0

*hbase.lru.cache.heavy.eviction.mb.size.limit* = 200

The results:

!requests_new_100p.png!

 

sdYCSB stats:
 | |*original*|*feature*|*%*|
|tbl1-u (ops/sec)|29,601|39,088|132|
|tbl2-u (ops/sec)|38,793|61,692|159|
|tbl3-u (ops/sec)|38,216|60,415|158|
|tbl4-u (ops/sec)|325|657|202|
|tbl1-z (ops/sec)|46,990|58,252|124|
|tbl2-z (ops/sec)|54,401|72,484|133|
|tbl3-z (ops/sec)|57,100|69,984|123|
|tbl4-z (ops/sec)|452|763|169|
|tbl1-l (ops/sec)|56,001|63,804|114|
|tbl2-l (ops/sec)|68,700|76,074|111|
|tbl3-l (ops/sec)|64,189|72,229|113|
|tbl4-l (ops/sec)|619|897|145|
| | | | |
| | | | |
| |*original*|*feature*|*%*|
|tbl1-u AverageLatency(us)|1,686|1,276|76|
|tbl2-u AverageLatency(us)|1,287|808|63|
|tbl3-u AverageLatency(us)|1,306|825|63|
|tbl4-u AverageLatency(us)|76,810|38,007|49|
|tbl1-z AverageLatency(us)|1,061|856|81|
|tbl2-z AverageLatency(us)|917|688|75|
|tbl3-z AverageLatency(us)|873|712|82|
|tbl4-z AverageLatency(us)|55,114|32,670|59|
|tbl1-l AverageLatency(us)|890|781|88|
|tbl2-l AverageLatency(us)|726|655|90|
|tbl3-l AverageLatency(us)|777|690|89|
|tbl4-l AverageLatency(us)|40,235|27,774|69|
| | | | |
| | | | |
| |*original*|*feature*|*%*|
|tbl1-u 95thPercentileLatency(us)|2,831|2,569|91|
|tbl2-u 95thPercentileLatency(us)|1,266|1,073|85|
|tbl3-u 95thPercentileLatency(us)|1,497|1,194|80|
|tbl4-u 95thPercentileLatency(us)|370,943|49,471|13|
|tbl1-z 95thPercentileLatency(us)|1,784|1,669|94|
|tbl2-z 95thPercentileLatency(us)|918|871|95|
|tbl3-z 95thPercentileLatency(us)|978|933|95|
|tbl4-z 95thPercentileLatency(us)|336,639|48,863|15|
|tbl1-l 95thPercentileLatency(us)|1,523|1,441|95|
|tbl2-l 95thPercentileLatency(us)|820|825|101|
|tbl3-l 95thPercentileLatency(us)|918|907|99|
|tbl4-l 95thPercentileLatency(us)|77,951|48,575|62|

> BlockCache performance improve by reduce eviction rate
> ------------------------------------------------------
>
>                 Key: HBASE-23887
>                 URL: https://issues.apache.org/jira/browse/HBASE-23887
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache, Performance
>            Reporter: Danil Lipovoy
>            Priority: Minor
>         Attachments: 1582787018434_rs_metrics.jpg, 1582801838065_rs_metrics_new.png, BC_LongRun.png, BlockCacheEvictionProcess.gif, cmp.png, evict_BC100_vs_BC23.png, eviction_100p.png, eviction_100p.png, eviction_100p.png, gc_100p.png, read_requests_100pBC_vs_23pBC.png, requests_100p.png, requests_100p.png, requests_new_100p.png
>
>
> Hi!
> I first time here, correct me please if something wrong.
> I want propose how to improve performance when data in HFiles much more than BlockChache (usual story in BigData). The idea - caching only part of DATA blocks. It is good becouse LruBlockCache starts to work and save huge amount of GC. 
> Sometimes we have more data than can fit into BlockCache and it is cause a high rate of evictions. In this case we can skip cache a block N and insted cache the N+1th block. Anyway we would evict N block quite soon and that why that skipping good for performance.
> Example:
> Imagine we have little cache, just can fit only 1 block and we are trying to read 3 blocks with offsets:
> 124
> 198
> 223
> Current way - we put the block 124, then put 198, evict 124, put 223, evict 198. A lot of work (5 actions).
> With the feature - last few digits evenly distributed from 0 to 99. When we divide by modulus we got:
> 124 -> 24
> 198 -> 98
> 223 -> 23
> It helps to sort them. Some part, for example below 50 (if we set *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip others. It means we will not try to handle the block 198 and save CPU for other job. In the result - we put block 124, then put 223, evict 124 (3 actions). 
> See the picture in attachment with test below. Requests per second is higher, GC is lower.
>  
> The key point of the code:
> Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 100
>  
> But if we set it 1-99, then will work the next logic:
>  
>  
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) {   
>   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
>     if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
>       return;    
> ... 
> // the same code as usual
> }
> {code}
>  
> Other parameters help to control when this logic will be enabled. It means it will work only while heavy reading going on.
> hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run eviction process that start to avoid of putting data to BlockCache
> hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to evicted each time that start to avoid of putting data to BlockCache
> By default: if 10 times (100 secunds) evicted more than 10 MB (each time) then we start to skip 50% of data blocks.
> When heavy evitions process end then new logic off and will put into BlockCache all blocks again.
>  
> Descriptions of the test:
> 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF)
> Total BlockCache Size = 48 Gb (8 % of data in HFiles)
> Random read in 20 threads
>  
> I am going to make Pull Request, hope it is right way to make some contribution in this cool product.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)