You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Aihua Xu (Jira)" <ji...@apache.org> on 2022/04/17 22:52:00 UTC

[jira] [Commented] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

    [ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523465#comment-17523465 ] 

Aihua Xu commented on HADOOP-13144:
-----------------------------------

This change has not been merged in the trunk but we have applied in our environment and observed the performance improvement. While when the routers are heavily overloaded, the routers will create too many connections against NameNode and the NameNode gets degraded with too many file descriptors. We have tuned the connection pool size (logical connections) within routers but the won't be able to control much on the physical connections. I have attached a new change which enables fine-grained configuration on the number of physical connections for a nameservice or for a user. With such control/connection sharing, we didn't observe performance degradation with connection size like 64.  

{quote}dfs.federation.router.ipc.connection.size=64
dfs.federation.router.ipc.connection.size.ns0=32
dfs.federation.router.ipc.connection.size.ns0.ingestion=128{quote}


> Enhancing IPC client throughput via multiple connections per user
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13144
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13144
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Jason Kace
>            Assignee: Íñigo Goiri
>            Priority: Minor
>         Attachments: HADOOP-13144-performance.patch, HADOOP-13144.000.patch, HADOOP-13144.001.patch, HADOOP-13144.002.patch, HADOOP-13144.003.patch, HADOOP-13144_overload_enhancement.patch
>
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique to the connection's remote address, ticket and protocol.  Each ConnectionId is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single thread for a each user/ticket + address.  If a single user makes repeated calls (1k-100k/sec) to the same destination, the IPC client becomes a bottleneck.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org