You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "fanrui (Jira)" <ji...@apache.org> on 2020/08/24 17:29:00 UTC

[jira] [Updated] (HADOOP-17222) Create socket address combined with cache to speed up hdfs client choose DataNode

     [ https://issues.apache.org/jira/browse/HADOOP-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

fanrui updated HADOOP-17222:
----------------------------
    Attachment: Before optimization.svg
                After optimization.svg

> Create socket address combined with cache to speed up hdfs client choose DataNode
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-17222
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17222
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common, hdfs-client
>         Environment: HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g 
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1) 
> @Warmup(iterations = 300) 
> @Measurement(iterations = 300)
>            Reporter: fanrui
>            Priority: Major
>         Attachments: After Optimization remark.png, After optimization.svg, Before Optimization remark.png, Before optimization.svg
>
>
> Hdfs client selects best DN for hdfs Block. method call stack:
> DFSInputStream.chooseDataNode -> getBestNodeDNAddrPair -> NetUtils.createSocketAddr
> NetUtils.createSocketAddr creates the corresponding InetSocketAddress based on the host and port. There are some heavier operations in the NetUtils.createSocketAddr method, for example: URI.create(target), so NetUtils.createSocketAddr takes more time to execute.
> The following is my performance report. The report is based on HBase calling hdfs. HBase is a high-frequency access client for hdfs, because HBase read operations often access a small DataBlock (about 64k) instead of the entire HFile. In the case of high frequency access, the NetUtils.createSocketAddr method is time-consuming.
> h3. Test Environment:
>  
> {code:java}
> HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g 
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1) 
> @Warmup(iterations = 300) 
> @Measurement(iterations = 300)
> {code}
> h4. Before Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts for 4.86% of the entire CPU, and the creation of URIs accounts for a larger proportion.
> {{!Before Optimization remark.png|thumbnail!}}
> h3. Optimization ideas:
> NetUtils.createSocketAddr creates InetSocketAddress based on host and port. Here we can add Cache to InetSocketAddress. The key of Cache is host and port, and the value is InetSocketAddress.
> h4. After Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts for 0.54% of the entire CPU. Here, ConcurrentHashMap is used as the Cache, and the ConcurrentHashMap.get() method gets data from the Cache. The CPU usage of DFSInputStream.getBestNodeDNAddrPair has been optimized from 4.86% to 0.54%.
> {{!After Optimization remark.png|thumbnail!}} 
> h3. Original FlameGraph link:
> [Before Optimization|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]
> [After Optimization FlameGraph|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org