You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/17 23:02:05 UTC

[jira] [Resolved] (HADOOP-2864) Improve the Scalability and Robustness of IPC

     [ https://issues.apache.org/jira/browse/HADOOP-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved HADOOP-2864.
--------------------------------------

    Resolution: Fixed

This has changed so much since this JIRA was filed that I'm just going to close this as stale.

> Improve the Scalability and Robustness of IPC
> ---------------------------------------------
>
>                 Key: HADOOP-2864
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2864
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.16.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: RPCScalabilityDesignWeb.pdf
>
>
> This jira is intended to enhance IPC's scalability and robustness. 
> Currently an IPC server can easily hung due to a disk failure or garbage collection, during which it cannot respond to the clients promptly. This has caused a lot of dropped calls and delayed responses thus many running applications fail on timeout. On the other side if busy clients send a lot of requests to the server in a short period of time or too many clients communicate with the server simultaneously, the server may be swarmed by requests and cannot work responsively. 
> The proposed changes aim to 
> # provide a better client/server coordination
> #* Server should be able to throttle client during burst of requests.
> #* A slow client should not affect server from serving other clients.
> #* A temporary hanging server should not cause catastrophic failures to clients.
> # Client/server should detect remote side failures. Examples of failures include: (1) the remote host is crashed; (2) the remote host is crashed and then rebooted; (3) the remote process is crashed or shut down by an operator;
> # Fairness. Each client should be able to make progress.



--
This message was sent by Atlassian JIRA
(v6.2#6252)