You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Danil Lipovoy (Jira)" <ji...@apache.org> on 2020/03/03 18:10:00 UTC
[jira] [Updated] (HADOOP-16901) HDFS-client: boost ShortCircuit
Cache
[ https://issues.apache.org/jira/browse/HADOOP-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danil Lipovoy updated HADOOP-16901:
-----------------------------------
Description:
I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one.
The key points:
1. Create array of caches:
{code:java}
private ClientContext(String name, DfsClientConf conf, Configuration config) {
...
shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
for (int i = 0; i < this.clientShortCircuitNum; i++) {
this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
}
{code}
2 Then divide blocks by caches:
{code:java}
public ShortCircuitCache getShortCircuitCache(long idx) {
return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}
{code}
3. And how to call it:
{code:java}
ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId());
{code}
The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same.
It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%.
Will try to add the link to PullRequest soon.
Hope it is interesting for someone.
Ready to explain some unobvious things.
was:
I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one.
The key points:
1. Create array of caches:
{code:java}
private ClientContext(String name, DfsClientConf conf, Configuration config) {
...
shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
for (int i = 0; i < this.clientShortCircuitNum; i++) {
this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
}
{code}
2 Then divide blocks by caches:
{code:java}
public ShortCircuitCache getShortCircuitCache(long idx) {
return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}
{code}
3. And how to call it:
{code:java}
ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId());
{code}
The last number of offset evenly distributed from 0 to 9 - thats why all caches will full approximatly the same.
It is good for performance. Below the attachment, where clientShortCircuitNum = 3. There is load test reading HDFS via HBase. We can see that performance grows ~30%, CPU usage about 15%.
Will try to add the link to PullRequest soon.
Hope it is intresting for somebody.
Ready to explain some unobvious things.
> HDFS-client: boost ShortCircuit Cache
> -------------------------------------
>
> Key: HADOOP-16901
> URL: https://issues.apache.org/jira/browse/HADOOP-16901
> Project: Hadoop Common
> Issue Type: New Feature
> Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 8 RegionServers (2 by host)
> 8 tables by 64 regions by 1.88 Gb data in each = 1200 Gb total
> Random read in 800 threads via YCSB and a little bit updates (10% of reads)
> Reporter: Danil Lipovoy
> Priority: Minor
> Attachments: hdfs_cpu.png, hdfs_reads.png
>
>
> I want to propose how to improve reading performance HDFS-client. The idea: create few instances SchortCircuit caches instead of one.
> The key points:
> 1. Create array of caches:
> {code:java}
> private ClientContext(String name, DfsClientConf conf, Configuration config) {
> ...
> shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
> for (int i = 0; i < this.clientShortCircuitNum; i++) {
> this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
> }
> {code}
> 2 Then divide blocks by caches:
> {code:java}
> public ShortCircuitCache getShortCircuitCache(long idx) {
> return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
> }
> {code}
> 3. And how to call it:
> {code:java}
> ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId());
> {code}
> The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same.
> It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%.
> Will try to add the link to PullRequest soon.
> Hope it is interesting for someone.
> Ready to explain some unobvious things.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org