You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/06/15 03:46:07 UTC

[jira] Updated: (HADOOP-6026) Improve the performance efficiency of task initialization at the JobTracker

     [ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-6026:
-------------------------------

    Attachment: HADOOP-6026.1.patch

I agree with Dhruba's comment but I think currently there is probably no such requirement from any real deployed environment. And if there is, simple uniform timeout may not be the best way to deprecate an item in the cache.

I will vote for simplicity of the code for now. I've put a comment there. In the future people can add caching policy if such a requirement comes up.


> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-6026
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6026
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: dhruba borthakur
>            Assignee: Zheng Shao
>         Attachments: HADOOP-6026.1.patch
>
>
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes an external script that resolves the hostname to a network rack location. The time spent in invoking this external script can be reduced if the hostname and their rack locations are inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and avoid invoking the external "resolve" script is most cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.