You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2009/06/12 10:02:07 UTC
[jira] Created: (HADOOP-6026) Improve the performance efficiency of
task initialization at the JobTracker
Improve the performance efficiency of task initialization at the JobTracker
---------------------------------------------------------------------------
Key: HADOOP-6026
URL: https://issues.apache.org/jira/browse/HADOOP-6026
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: dhruba borthakur
Assignee: Zheng Shao
The JobTracker reads the splits for a job at Job Initialization time. Then, for each location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes an external script that resolves the hostname to a network rack location. The time spent in invoking this external script can be reduced if the hostname and their rack locations are inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and avoid invoking the external "resolve" script is most cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6026) Improve the performance efficiency
of task initialization at the JobTracker
Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719396#action_12719396 ]
Devaraj Das commented on HADOOP-6026:
-------------------------------------
If you are using ScriptBasedMapping as the implementation for resolution, the problem outlined in this jira doesn't exist. The implementation of CachedDNSToSwitchMapping that the ScriptBasedMapping extends does the necessary caching.
In fact, I don't think we should do this caching in the core framework (and then start worrying about the cache timeout, etc.). This should be left to the implementations of DNSToSwitchMapping.
Thoughts?
> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>
> Key: HADOOP-6026
> URL: https://issues.apache.org/jira/browse/HADOOP-6026
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: dhruba borthakur
> Assignee: Zheng Shao
> Attachments: HADOOP-6026.1.patch
>
>
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes an external script that resolves the hostname to a network rack location. The time spent in invoking this external script can be reduced if the hostname and their rack locations are inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and avoid invoking the external "resolve" script is most cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6026) Improve the performance efficiency of
task initialization at the JobTracker
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HADOOP-6026:
-------------------------------
Attachment: HADOOP-6026.1.patch
I agree with Dhruba's comment but I think currently there is probably no such requirement from any real deployed environment. And if there is, simple uniform timeout may not be the best way to deprecate an item in the cache.
I will vote for simplicity of the code for now. I've put a comment there. In the future people can add caching policy if such a requirement comes up.
> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>
> Key: HADOOP-6026
> URL: https://issues.apache.org/jira/browse/HADOOP-6026
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: dhruba borthakur
> Assignee: Zheng Shao
> Attachments: HADOOP-6026.1.patch
>
>
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes an external script that resolves the hostname to a network rack location. The time spent in invoking this external script can be reduced if the hostname and their rack locations are inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and avoid invoking the external "resolve" script is most cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-6026) Improve the performance efficiency
of task initialization at the JobTracker
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao resolved HADOOP-6026.
--------------------------------
Resolution: Invalid
already fixed in 0.19.1
> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>
> Key: HADOOP-6026
> URL: https://issues.apache.org/jira/browse/HADOOP-6026
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: dhruba borthakur
> Assignee: Zheng Shao
> Attachments: HADOOP-6026.1.patch
>
>
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes an external script that resolves the hostname to a network rack location. The time spent in invoking this external script can be reduced if the hostname and their rack locations are inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and avoid invoking the external "resolve" script is most cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6026) Improve the performance efficiency
of task initialization at the JobTracker
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719849#action_12719849 ]
Zheng Shao commented on HADOOP-6026:
------------------------------------
I see. We were on 0.17 where the class hierarchy is:
"public final class ScriptBasedMapping implements Configurable, DNSToSwitchMapping"
It seems that CachedDNSToSwitchMapping is added in 0.19.
I also agree the caching should be done in the implementation, because different impl may have different caching policies etc.
I will close this jira.
> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>
> Key: HADOOP-6026
> URL: https://issues.apache.org/jira/browse/HADOOP-6026
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: dhruba borthakur
> Assignee: Zheng Shao
> Attachments: HADOOP-6026.1.patch
>
>
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes an external script that resolves the hostname to a network rack location. The time spent in invoking this external script can be reduced if the hostname and their rack locations are inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and avoid invoking the external "resolve" script is most cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6026) Improve the performance efficiency
of task initialization at the JobTracker
Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718753#action_12718753 ]
dhruba borthakur commented on HADOOP-6026:
------------------------------------------
One drawback to the above situation is that the mapping of a hostname to its racklocation would be permanent for the lifetime of a JobTracker. To accomodate a more rapidly changing network topology, we can expire items from the cache after every hour or so.
> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>
> Key: HADOOP-6026
> URL: https://issues.apache.org/jira/browse/HADOOP-6026
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: dhruba borthakur
> Assignee: Zheng Shao
>
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes an external script that resolves the hostname to a network rack location. The time spent in invoking this external script can be reduced if the hostname and their rack locations are inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and avoid invoking the external "resolve" script is most cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.