You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Kang Xiao (JIRA)" <ji...@apache.org> on 2011/02/21 06:04:39 UTC

[jira] Commented: (MAPREDUCE-2340) optimize JobInProgress.initTasks()

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997278#comment-12997278 ] 

Kang Xiao commented on MAPREDUCE-2340:
--------------------------------------

For large jobs, job initialization seem to be very slow. The cause is that JobInProgress.initTasks() calls createCache() to build localiztion cache list. For each split location createCache() uses jobtracker.resolveAndAddToTopology(host) to get its topology node object. However, there is alreay a hostname => topology node map cache in jobtracker that can be used to speed up the get node by hostname operation. 

> optimize JobInProgress.initTasks()
> ----------------------------------
>
>                 Key: MAPREDUCE-2340
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2340
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Kang Xiao
>
> JobTracker's hostnameToNodeMap cache can speed up JobInProgress.initTasks() and JobInProgress.createCache() significantly. A test for 1 job with 100000 maps on a 2400 cluster shows nearly 10 and 50 times speed up for initTasks() and createCache(). 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira