You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2012/07/26 18:29:35 UTC
[jira] [Updated] (MAPREDUCE-4487) Reduce job latency by removing
hardcoded sleep statements
[ https://issues.apache.org/jira/browse/MAPREDUCE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated MAPREDUCE-4487:
---------------------------------
Attachment: MAPREDUCE-4487.patch
Here's a patch which removes sleeps (or improves their usage) in three places:
* In ReduceTask's fetchOutputs() if there are no map outputs in flight or scheduled, then it sleeps for five seconds. Replacing this condition with a wait that is notified when new map outputs become available is an improvement.
* In ReduceTask's fetchOutputs() when all the output has been fetched there is a join on GetMapEventsThread, which may be sleeping (for 1s). Replacing this with a wait/notify removes the sleep overhead.
* In Child's main loop while waiting for tasks from the parent tasktracker, the thread sleeps for 0.5s initially then 1.5s if there haven't been any tasks for a while. Replacing this with a more fine grained exponential backoff helps responsiveness.
I ran some tests to investigate the effect of these changes. I ran a sleep job that sleeps for 1ms ({{bin/hadoop jar hadoop-*examples*jar sleep -m 1 -r 1 -mt 1 -rt 1}}) and measured the job execution time (on a single node cluster). Without the patch the mean time was 12.97s (over 10 runs, sd 0.53), and with the patch it was 9.109s (sd 1.0) - a significant improvement.
> Reduce job latency by removing hardcoded sleep statements
> ---------------------------------------------------------
>
> Key: MAPREDUCE-4487
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4487
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mrv1, performance
> Affects Versions: 1.0.3
> Reporter: Tom White
> Assignee: Tom White
> Attachments: MAPREDUCE-4487.patch
>
>
> There are a few places in MapReduce where there are hardcoded sleep statements. By replacing them with wait/notify or similar it's possible to reduce latency for short running jobs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira