You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Wei-Chiu Chuang (Jira)" <ji...@apache.org> on 2020/03/03 18:38:00 UTC

[jira] [Commented] (HADOOP-16901) HDFS-client: boost ShortCircuit Cache

    [ https://issues.apache.org/jira/browse/HADOOP-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050471#comment-17050471 ] 

Wei-Chiu Chuang commented on HADOOP-16901:
------------------------------------------

Hi [~pustota] thank you for your report and PR!
Added you to the contributor list of HDFS and HADOOP so this jira can be assigned to you.
I'd like to move this jira under the HDFS project since this is a HDFS issue exclusively.

> HDFS-client: boost ShortCircuit Cache
> -------------------------------------
>
>                 Key: HADOOP-16901
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16901
>             Project: Hadoop Common
>          Issue Type: New Feature
>         Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 8 RegionServers (2 by host)
> 8 tables by 64 regions by 1.88 Gb data in each = 1200 Gb total
> Random read in 800 threads via YCSB and a little bit updates (10% of reads)
>            Reporter: Danil Lipovoy
>            Assignee: Danil Lipovoy
>            Priority: Minor
>         Attachments: hdfs_cpu.png, hdfs_reads.png
>
>
> I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one. 
> The key points:
> 1. Create array of caches (see *dfs.client.short.circuit.num* in the pull requests below):
> {code:java}
> private ClientContext(String name, DfsClientConf conf, Configuration config) {
> ...
>     shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
>     for (int i = 0; i < this.clientShortCircuitNum; i++) {
>       this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
>     }
> {code}
> 2 Then divide blocks by caches:
> {code:java}
>   public ShortCircuitCache getShortCircuitCache(long idx) {
>     return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
>   }
> {code}
> 3. And how to call it:
> {code:java}
> ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId());
> {code}
> The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same.
> It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%. 
> Hope it is interesting for someone.
> Ready to explain some unobvious things.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org