You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Vivek Ratan (JIRA)" <ji...@apache.org> on 2008/10/27 10:34:44 UTC

[jira] Commented: (HADOOP-4513) Capacity scheduler should initialize tasks asynchronously

    [ https://issues.apache.org/jira/browse/HADOOP-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642881#action_12642881 ] 

Vivek Ratan commented on HADOOP-4513:
-------------------------------------

Yes, we need to make sure jobs are initialized asynchronously (so that initTasks() is not called synchronously  from within a heartbeat) and as early as possible (so that a job is already initialized when we consider it to run). We also want to have just a few number of waiting jobs initialized at any given time so that their memory footprint is low. I suggest we use an enhanced version of EagerTaskInitializationListener, so that jobs are initialized asynchronously in a separate thread. The difference being, we use some of the limits described in HADOOP-4428. We can have a limit on the total number of waiting jobs initialized (maybe 10 per queue), as well a limit on initialized jobs/user/queue (maybe 3/per/queue). The modified EagerTaskInitializationListener thread enforces these limits and only initializes jobs as necessary. 

> Capacity scheduler should initialize tasks asynchronously
> ---------------------------------------------------------
>
>                 Key: HADOOP-4513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4513
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Sreekanth Ramakrishnan
>
> Currently, the capacity scheduler initializes tasks on demand, as opposed to the eager initialization technique used by the default scheduler. This is done in order to save JT memory footprint. However, the initialization is done in the {{assignTasks}} API which is not a good idea as task initialization could be a time consuming operation. This JIRA is to move out the initialization outside the {{assignTasks}} API and do it asynchronously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.