You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Peter Bacsko (JIRA)" <ji...@apache.org> on 2017/11/08 16:49:00 UTC

[jira] [Updated] (MAPREDUCE-5124) AM lacks flow control for task events

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Bacsko updated MAPREDUCE-5124:
------------------------------------
    Attachment: MAPREDUCE-5124-CoalescingPOC-1.patch

I created a POC which uses this "event coalescing approach".

I roughly describe what changed:
* Added new method {{setNextUpdate()}} to {{TaskAttemptImpl}}
* Added the mapping of TaskAttemptID <-> TaskAttemptImpl
* At each {{statusUpdate()}}, we call {{setNextUpdate()}} and don't pass the status object as a payload
* In the {{StatusUpdater}} transition, we check if we need to update the status or not. If needsUpdate=true, then we run the original updater logic.

If we have backlog of task update events for a given attempt and that attempt hasn't been updated, the {{StatusUpdater}} will not do anything because {{needsUpdate}} will be false.

I also kept the original updating logic, that is, retrieving it from the event. First I tried to remove the original constructor of {{TaskAttemptStatusUpdateEvent}} but it caused compilation errors in various classes. It turned out that quite a few test cases use the old approach to manipulate the status of a task attempt. I didn't want to introduce too many code changes. Not sure what's the best solution in this case.

[~jlowe] could you take a look at this POC?

> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events from tasks.  If the AM is unable to keep pace with the rate of incoming events for a sufficient period of time then it will eventually exhaust the heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event processing, but the AM could still get behind if it's starved for CPU and/or handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org