You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/01/19 01:56:15 UTC
[jira] [Updated] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4946:
----------------------------------

    Attachment: MAPREDUCE-4946.patch

Attaching a patch that works around the problem by caching the type conversion of the map completion event.

It's a bit messy since previously we relied on the fact that updating the task completion event during a fetch failure implicitly updated map completion event since they were the same object.  When it's cached separately, we need a way to quickly lookup the cached map event from the task event so it can be updated as well.

Manually tested this on a large cluster, and it seems to make things better wrt. a thundering herd of reducers trying to connect.

A cleaner fix would be to convert TaskUmbilicalProtocol to use TaskAttemptCompletionEvents directly so we don't need to convert them at all except when honoring the org.apache.hadoop.mapreduce.Job.getTaskAttemptCompletionEvents interface which is unlikely to be called in a performance-critical scenario.

There's also the performance issue of the locking within the PBImpl classes, but that's an issue probably best left to another JIRA.
                
> Type conversion of map completion events leads to performance problems with large jobs
> --------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4946
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-4946.patch
>
>
> We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where reducers fail to connect back to the AM after being launched due to connection timeout.  Looking at stack traces of the AM during this time we see a lot of IPC servers stuck waiting for a lock to get the application ID while type converting the map completion events.  What's odd is that normally getting the application ID should be very cheap, but in this case we're type-converting thousands of map completion events for *each* reducer connecting.  That means we end up type-converting the map completion events over 45 million times during the lifetime of the example job (13,000 * 3,500).
> We either need to make the type conversion much cheaper (i.e.: lockless or at least read-write locked) or, even better, store the completion events in a form that does not require type conversion when serving them up to reducers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira