You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/04/16 16:42:59 UTC
[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too
slow.
[ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498092#comment-14498092 ]
Jason Lowe commented on YARN-3491:
----------------------------------
Storing asynchronously is going to be a bit dangerous -- we do not want to create a situation where a resource has started localizing but we haven't recorded the fact that we started it. Theoretically we could end up doing a recovery where we leak a resource or fail to realize a localization started but did not complete and we need to clean it up.
I think it's best at this point to have some hard evidence from a profiler or targeted log statements around the suspected code where all the time is being spent in the NM rather than guessing.
> PublicLocalizer#addResource is too slow.
> ----------------------------------------
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 2.7.0
> Reporter: zhihai xu
> Assignee: zhihai xu
> Priority: Critical
>
> Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).
> Currently FSDownload submission to the thread pool is done in PublicLocalizer#addResource which is running in Dispatcher thread and completed localization handling is done in PublicLocalizer#run which is running in PublicLocalizer thread.
> Because PublicLocalizer#addResource is time consuming, the thread pool can't be fully utilized. Instead of doing public resource localization in parallel(multithreading), public resource localization is serialized most of the time.
> Also there are two more benefits with this change:
> 1. The Dispatcher thread won't be blocked by PublicLocalizer#addResource . Dispatcher thread handles most of time critical events at Node manager.
> 2. don't need synchronization on HashMap (pending).
> Because pending will be only accessed in PublicLocalizer thread.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)