You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Zhang Zhaoning (JIRA)" <ji...@apache.org> on 2009/11/20 09:01:41 UTC

[jira] Created: (MAPREDUCE-1226) Granularity Variable Task Pre-Scheduler in Heterogeneous Environment

Granularity Variable Task Pre-Scheduler in Heterogeneous Environment 
---------------------------------------------------------------------

                 Key: MAPREDUCE-1226
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1226
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: jobtracker
         Environment: Heterogeneous Cluster
            Reporter: Zhang Zhaoning


As we deploy the LATE scheduler of the OSDI08 paper, upon some of our cluster enviroments, some slow nodes may be assigned tasks that every time run slowly and be re-executed then killed, so we found these nodes are always with no use and waste the assigned task slots.
In the LATE mechanism, we re-execute some of the tasks, so these tasks run on different node twice or more, then this cause some waste of the calculating resources.

Easily, we can remove these node out of the cluster or split the cluster into two or more. But I think it's useful and significant to design a mechanism to help low utility nodes to be effect.
 
We want to pre-schedule the tasks with the utility based on node historical logs, then assign larger size tasks to the fast nodes. In Hadoop task scheduler, we assign the map task in default splits of 64M. Some may split it into 128M. But, most of them are of the same granularity. So I want to alter this mechanism to a granularity variable one.

As we know, the Map task granularity depends on the DFS file size, while the Reduce task's depends on the Partitioner to split the intermediate results. So I think this is feasible to get the granularity variable mechanism.

If we use the pre-schedule model, then we can expect all the tasks can start at a nearly same time and finish at a nearly same time, and the job can fill a specific time slot. 

History-Log-Based nodes Utility description
This is the fundamental description of nodes for the pre-scheduler. And in the heterogeneous environment, the cluster can be split into different sub-cluster, and within the sub-cluster the nodes are homogeneous and between the sub-cluster the nodes are heterogeneous.

Nodes Utility Stability
We think this is important for the pre-scheduler depends on the stability of the nodes. And we could pick the bad stability nodes up and treat them differently, but we haven't have good method to handle this. 

Error tolerant
I think the original scheduler in the homogeneous cluster is designed to handle the error nodes, if some nodes get exceptions, the JobTracker re-execute them, and handle these exceptions dynamically.

So if we use the pre-scheduler, we must face the problem of the exceptions.
I propose that if some tasks got exceptions, we split the task into more than one part and execute them on more than one different nodes, then the expected finish time will be shorten, and the total job response time will not be too long.

Job Priorities
If we use this pre-scheduler, single job will fill the time slot, and if then will be some other high-priority jobs, they will wait. And I don't get effect methods to solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.