You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tsuyoshi OZAWA (JIRA)" <ji...@apache.org> on 2013/07/19 17:02:50 UTC

[jira] [Updated] (MAPREDUCE-5124) AM lacks flow control for task events

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsuyoshi OZAWA updated MAPREDUCE-5124:
--------------------------------------

    Attachment: MAPREDUCE-5124-proto.2.txt

I attached a rough prototype to restrict onetime-style RPC with keeping the backward compatibility. This prototype includes changes as follows:

1. adding RPC header to callType to distinguish ONETIME with HEATBEAT.
2. adding a new error code(ToBusyRetryLaterException).
3. adding a counter to restrict numbers of processing RPC within high-water mark to Server#Handler.

In a mean while, this prototype does NOT include:
1. test codes.
2. creating response to decide the heatbeat period dynamically to client.

If this design is acceptable, I make the next patch which include both of them. If you have any question about the design, let me know.
                
> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>         Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events from tasks.  If the AM is unable to keep pace with the rate of incoming events for a sufficient period of time then it will eventually exhaust the heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event processing, but the AM could still get behind if it's starved for CPU and/or handling a very large job with tens of thousands of active tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira