You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/27 22:59:37 UTC
[jira] Created: (HADOOP-173) optimize allocation of tasks w/ local
data
optimize allocation of tasks w/ local data
------------------------------------------
Key: HADOOP-173
URL: http://issues.apache.org/jira/browse/HADOOP-173
Project: Hadoop
Type: Improvement
Components: mapred
Versions: 0.2
Reporter: Doug Cutting
Assigned to: Doug Cutting
When a job first starts, all task trackers ask the job tracker for jobs at once. With lots of task trackers, the job tracker gets very slow. The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker. This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think). This could be optimized by adding a table mapping from host to task.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-173) optimize allocation of tasks w/ local
data
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-173?page=all ]
Doug Cutting updated HADOOP-173:
--------------------------------
Attachment: fast-local-task.patch
This patch optimizes the jobtracker's allocation of tasks to nodes that have local data. I have tested it, but not yet on a large cluster.
> optimize allocation of tasks w/ local data
> ------------------------------------------
>
> Key: HADOOP-173
> URL: http://issues.apache.org/jira/browse/HADOOP-173
> Project: Hadoop
> Type: Improvement
> Components: mapred
> Versions: 0.2
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Attachments: fast-local-task.patch
>
> When a job first starts, all task trackers ask the job tracker for jobs at once. With lots of task trackers, the job tracker gets very slow. The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker. This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think). This could be optimized by adding a table mapping from host to task.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Resolved: (HADOOP-173) optimize allocation of tasks w/ local
data
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-173?page=all ]
Doug Cutting resolved HADOOP-173:
---------------------------------
Fix Version: 0.2
Resolution: Fixed
I committed this.
> optimize allocation of tasks w/ local data
> ------------------------------------------
>
> Key: HADOOP-173
> URL: http://issues.apache.org/jira/browse/HADOOP-173
> Project: Hadoop
> Type: Improvement
> Components: mapred
> Versions: 0.2
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 0.2
> Attachments: fast-local-task.patch
>
> When a job first starts, all task trackers ask the job tracker for jobs at once. With lots of task trackers, the job tracker gets very slow. The first type of task that the job tracker attempts to find is one with some of its input data stored on the same node as the task tracker. This case currently loops through tasks blindly, which, on average, requires numHosts/(replication*2) iterations to find a match (I think). This could be optimized by adding a table mapping from host to task.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira