You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/27 22:59:37 UTC

[jira] Created: (HADOOP-173) optimize allocation of tasks w/ local data

optimize allocation of tasks w/ local data
------------------------------------------

         Key: HADOOP-173
         URL: http://issues.apache.org/jira/browse/HADOOP-173
     Project: Hadoop
        Type: Improvement

  Components: mapred  
    Versions: 0.2    
    Reporter: Doug Cutting
 Assigned to: Doug Cutting 


When a job first starts, all task trackers ask the job tracker for jobs at once.  With lots of task trackers, the job tracker gets very slow.  The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker.  This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think).  This could be optimized by adding a table mapping from host to task.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-173) optimize allocation of tasks w/ local data

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-173?page=all ]

Doug Cutting updated HADOOP-173:
--------------------------------

    Attachment: fast-local-task.patch

This patch optimizes the jobtracker's allocation of tasks to nodes that have local data.  I have tested it, but not yet on a large cluster.

> optimize allocation of tasks w/ local data
> ------------------------------------------
>
>          Key: HADOOP-173
>          URL: http://issues.apache.org/jira/browse/HADOOP-173
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>  Attachments: fast-local-task.patch
>
> When a job first starts, all task trackers ask the job tracker for jobs at once.  With lots of task trackers, the job tracker gets very slow.  The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker.  This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think).  This could be optimized by adding a table mapping from host to task.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-173) optimize allocation of tasks w/ local data

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-173?page=all ]
     
Doug Cutting resolved HADOOP-173:
---------------------------------

    Fix Version: 0.2
     Resolution: Fixed

I committed this.

> optimize allocation of tasks w/ local data
> ------------------------------------------
>
>          Key: HADOOP-173
>          URL: http://issues.apache.org/jira/browse/HADOOP-173
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.2
>  Attachments: fast-local-task.patch
>
> When a job first starts, all task trackers ask the job tracker for jobs at once.  With lots of task trackers, the job tracker gets very slow.  The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker.  This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think).  This could be optimized by adding a table mapping from host to task.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira