You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Yaron Gonen <ya...@gmail.com> on 2012/12/27 11:40:52 UTC

Selecting a task for the tasktracker

Hi,
If I understand correctly, the job scheduler (why is the class called
TaskScheduler?) is responsible for assigning the task whose split is as
close as possible to the tasktacker.
Meaning that the job scheduler is responsible to two things:

   1. Selecting a job.
   2. Once a job is selected, assign the closest task to the tasktracker
   that send the heartbeat.

Is this correct?

I want to write my own job scheduler to change the logic above, but it says The
type TaskScheduler is not visible.
How can I write my own scheduler?

thanks

Re: Selecting a task for the tasktracker

Posted by Yaron Gonen <ya...@gmail.com>.

Thanks a lot!


On Thu, Dec 27, 2012 at 8:11 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

>
> On top of that, the message indicates that you need to have your scheduler
> class in the mapred package.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:
>
> Hi,
>
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
> trunk, the Mapreduce framework is completely revamped to Yarn (
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
> and you may need to look at different interfaces for building your own
> scheduler.
>
> In 1.0, the primary function of the TaskScheduler is the assignTasks
> method. Given a TaskTracker object as input, this method figures out how
> many free map and reduce slots exist in that particular tasktracker and
> selects one or more task that can be scheduled on it. Since task selection
> is the primary responsibility and the granularity is at a task level, the
> class is called TaskScheduler.
>
> The method of choosing a job and then a task within the job is customised
> by the different schedulers already present in Hadoop. Also, the core logic
> of selecting a map task with data locality optimizations is not implemented
> in the schedulers per se, but they rely on the JobInProgress object in
> MapReduce framework for achieving the same.
>
> To implement your own Scheduler, it may be best to look at the sources of
> existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
> FairScheduler.  In particular, the last two are in the contrib modules of
> mapreduce, and hence will be fairly independent to follow. Their build
> files will also tell you how to resolve any compile problems like the one
> you are facing.
>
> Thanks
> Hemanth
>
>
>
>
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com>wrote:
>
>> Hi,
>> If I understand correctly, the job scheduler (why is the class called
>> TaskScheduler?) is responsible for assigning the task whose split is as
>> close as possible to the tasktacker.
>>  Meaning that the job scheduler is responsible to two things:
>>
>>    1. Selecting a job.
>>    2. Once a job is selected, assign the closest task to the tasktracker
>>    that send the heartbeat.
>>
>> Is this correct?
>>
>> I want to write my own job scheduler to change the logic above, but it
>> says The type TaskScheduler is not visible.
>> How can I write my own scheduler?
>>
>> thanks
>>
>
>
>

Re: Selecting a task for the tasktracker

Posted by Yaron Gonen <ya...@gmail.com>.

Thanks a lot!


On Thu, Dec 27, 2012 at 8:11 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

>
> On top of that, the message indicates that you need to have your scheduler
> class in the mapred package.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:
>
> Hi,
>
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
> trunk, the Mapreduce framework is completely revamped to Yarn (
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
> and you may need to look at different interfaces for building your own
> scheduler.
>
> In 1.0, the primary function of the TaskScheduler is the assignTasks
> method. Given a TaskTracker object as input, this method figures out how
> many free map and reduce slots exist in that particular tasktracker and
> selects one or more task that can be scheduled on it. Since task selection
> is the primary responsibility and the granularity is at a task level, the
> class is called TaskScheduler.
>
> The method of choosing a job and then a task within the job is customised
> by the different schedulers already present in Hadoop. Also, the core logic
> of selecting a map task with data locality optimizations is not implemented
> in the schedulers per se, but they rely on the JobInProgress object in
> MapReduce framework for achieving the same.
>
> To implement your own Scheduler, it may be best to look at the sources of
> existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
> FairScheduler.  In particular, the last two are in the contrib modules of
> mapreduce, and hence will be fairly independent to follow. Their build
> files will also tell you how to resolve any compile problems like the one
> you are facing.
>
> Thanks
> Hemanth
>
>
>
>
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com>wrote:
>
>> Hi,
>> If I understand correctly, the job scheduler (why is the class called
>> TaskScheduler?) is responsible for assigning the task whose split is as
>> close as possible to the tasktacker.
>>  Meaning that the job scheduler is responsible to two things:
>>
>>    1. Selecting a job.
>>    2. Once a job is selected, assign the closest task to the tasktracker
>>    that send the heartbeat.
>>
>> Is this correct?
>>
>> I want to write my own job scheduler to change the logic above, but it
>> says The type TaskScheduler is not visible.
>> How can I write my own scheduler?
>>
>> thanks
>>
>
>
>

Re: Selecting a task for the tasktracker

Posted by Yaron Gonen <ya...@gmail.com>.

Thanks a lot!


On Thu, Dec 27, 2012 at 8:11 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

>
> On top of that, the message indicates that you need to have your scheduler
> class in the mapred package.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:
>
> Hi,
>
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
> trunk, the Mapreduce framework is completely revamped to Yarn (
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
> and you may need to look at different interfaces for building your own
> scheduler.
>
> In 1.0, the primary function of the TaskScheduler is the assignTasks
> method. Given a TaskTracker object as input, this method figures out how
> many free map and reduce slots exist in that particular tasktracker and
> selects one or more task that can be scheduled on it. Since task selection
> is the primary responsibility and the granularity is at a task level, the
> class is called TaskScheduler.
>
> The method of choosing a job and then a task within the job is customised
> by the different schedulers already present in Hadoop. Also, the core logic
> of selecting a map task with data locality optimizations is not implemented
> in the schedulers per se, but they rely on the JobInProgress object in
> MapReduce framework for achieving the same.
>
> To implement your own Scheduler, it may be best to look at the sources of
> existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
> FairScheduler.  In particular, the last two are in the contrib modules of
> mapreduce, and hence will be fairly independent to follow. Their build
> files will also tell you how to resolve any compile problems like the one
> you are facing.
>
> Thanks
> Hemanth
>
>
>
>
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com>wrote:
>
>> Hi,
>> If I understand correctly, the job scheduler (why is the class called
>> TaskScheduler?) is responsible for assigning the task whose split is as
>> close as possible to the tasktacker.
>>  Meaning that the job scheduler is responsible to two things:
>>
>>    1. Selecting a job.
>>    2. Once a job is selected, assign the closest task to the tasktracker
>>    that send the heartbeat.
>>
>> Is this correct?
>>
>> I want to write my own job scheduler to change the logic above, but it
>> says The type TaskScheduler is not visible.
>> How can I write my own scheduler?
>>
>> thanks
>>
>
>
>

Re: Selecting a task for the tasktracker

Posted by Yaron Gonen <ya...@gmail.com>.

Thanks a lot!


On Thu, Dec 27, 2012 at 8:11 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

>
> On top of that, the message indicates that you need to have your scheduler
> class in the mapred package.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:
>
> Hi,
>
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
> trunk, the Mapreduce framework is completely revamped to Yarn (
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
> and you may need to look at different interfaces for building your own
> scheduler.
>
> In 1.0, the primary function of the TaskScheduler is the assignTasks
> method. Given a TaskTracker object as input, this method figures out how
> many free map and reduce slots exist in that particular tasktracker and
> selects one or more task that can be scheduled on it. Since task selection
> is the primary responsibility and the granularity is at a task level, the
> class is called TaskScheduler.
>
> The method of choosing a job and then a task within the job is customised
> by the different schedulers already present in Hadoop. Also, the core logic
> of selecting a map task with data locality optimizations is not implemented
> in the schedulers per se, but they rely on the JobInProgress object in
> MapReduce framework for achieving the same.
>
> To implement your own Scheduler, it may be best to look at the sources of
> existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
> FairScheduler.  In particular, the last two are in the contrib modules of
> mapreduce, and hence will be fairly independent to follow. Their build
> files will also tell you how to resolve any compile problems like the one
> you are facing.
>
> Thanks
> Hemanth
>
>
>
>
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com>wrote:
>
>> Hi,
>> If I understand correctly, the job scheduler (why is the class called
>> TaskScheduler?) is responsible for assigning the task whose split is as
>> close as possible to the tasktacker.
>>  Meaning that the job scheduler is responsible to two things:
>>
>>    1. Selecting a job.
>>    2. Once a job is selected, assign the closest task to the tasktracker
>>    that send the heartbeat.
>>
>> Is this correct?
>>
>> I want to write my own job scheduler to change the logic above, but it
>> says The type TaskScheduler is not visible.
>> How can I write my own scheduler?
>>
>> thanks
>>
>
>
>

Re: Selecting a task for the tasktracker

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

On top of that, the message indicates that you need to have your scheduler class in the mapred package.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:

> Hi,
> 
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and trunk, the Mapreduce framework is completely revamped to Yarn (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) and you may need to look at different interfaces for building your own scheduler.
> 
> In 1.0, the primary function of the TaskScheduler is the assignTasks method. Given a TaskTracker object as input, this method figures out how many free map and reduce slots exist in that particular tasktracker and selects one or more task that can be scheduled on it. Since task selection is the primary responsibility and the granularity is at a task level, the class is called TaskScheduler.
> 
> The method of choosing a job and then a task within the job is customised by the different schedulers already present in Hadoop. Also, the core logic of selecting a map task with data locality optimizations is not implemented in the schedulers per se, but they rely on the JobInProgress object in MapReduce framework for achieving the same.
> 
> To implement your own Scheduler, it may be best to look at the sources of existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or FairScheduler.  In particular, the last two are in the contrib modules of mapreduce, and hence will be fairly independent to follow. Their build files will also tell you how to resolve any compile problems like the one you are facing.
> 
> Thanks
> Hemanth  
> 
> 
> 
> 
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com> wrote:
> Hi,
> If I understand correctly, the job scheduler (why is the class called TaskScheduler?) is responsible for assigning the task whose split is as close as possible to the tasktacker.
> Meaning that the job scheduler is responsible to two things:
> Selecting a job.
> Once a job is selected, assign the closest task to the tasktracker that send the heartbeat.
> Is this correct?
> 
> I want to write my own job scheduler to change the logic above, but it says The type TaskScheduler is not visible.
> How can I write my own scheduler?
> 
> thanks
>

Re: Selecting a task for the tasktracker

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

On top of that, the message indicates that you need to have your scheduler class in the mapred package.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:

> Hi,
> 
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and trunk, the Mapreduce framework is completely revamped to Yarn (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) and you may need to look at different interfaces for building your own scheduler.
> 
> In 1.0, the primary function of the TaskScheduler is the assignTasks method. Given a TaskTracker object as input, this method figures out how many free map and reduce slots exist in that particular tasktracker and selects one or more task that can be scheduled on it. Since task selection is the primary responsibility and the granularity is at a task level, the class is called TaskScheduler.
> 
> The method of choosing a job and then a task within the job is customised by the different schedulers already present in Hadoop. Also, the core logic of selecting a map task with data locality optimizations is not implemented in the schedulers per se, but they rely on the JobInProgress object in MapReduce framework for achieving the same.
> 
> To implement your own Scheduler, it may be best to look at the sources of existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or FairScheduler.  In particular, the last two are in the contrib modules of mapreduce, and hence will be fairly independent to follow. Their build files will also tell you how to resolve any compile problems like the one you are facing.
> 
> Thanks
> Hemanth  
> 
> 
> 
> 
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com> wrote:
> Hi,
> If I understand correctly, the job scheduler (why is the class called TaskScheduler?) is responsible for assigning the task whose split is as close as possible to the tasktacker.
> Meaning that the job scheduler is responsible to two things:
> Selecting a job.
> Once a job is selected, assign the closest task to the tasktracker that send the heartbeat.
> Is this correct?
> 
> I want to write my own job scheduler to change the logic above, but it says The type TaskScheduler is not visible.
> How can I write my own scheduler?
> 
> thanks
>

Re: Selecting a task for the tasktracker

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

On top of that, the message indicates that you need to have your scheduler class in the mapred package.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:

> Hi,
> 
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and trunk, the Mapreduce framework is completely revamped to Yarn (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) and you may need to look at different interfaces for building your own scheduler.
> 
> In 1.0, the primary function of the TaskScheduler is the assignTasks method. Given a TaskTracker object as input, this method figures out how many free map and reduce slots exist in that particular tasktracker and selects one or more task that can be scheduled on it. Since task selection is the primary responsibility and the granularity is at a task level, the class is called TaskScheduler.
> 
> The method of choosing a job and then a task within the job is customised by the different schedulers already present in Hadoop. Also, the core logic of selecting a map task with data locality optimizations is not implemented in the schedulers per se, but they rely on the JobInProgress object in MapReduce framework for achieving the same.
> 
> To implement your own Scheduler, it may be best to look at the sources of existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or FairScheduler.  In particular, the last two are in the contrib modules of mapreduce, and hence will be fairly independent to follow. Their build files will also tell you how to resolve any compile problems like the one you are facing.
> 
> Thanks
> Hemanth  
> 
> 
> 
> 
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com> wrote:
> Hi,
> If I understand correctly, the job scheduler (why is the class called TaskScheduler?) is responsible for assigning the task whose split is as close as possible to the tasktacker.
> Meaning that the job scheduler is responsible to two things:
> Selecting a job.
> Once a job is selected, assign the closest task to the tasktracker that send the heartbeat.
> Is this correct?
> 
> I want to write my own job scheduler to change the logic above, but it says The type TaskScheduler is not visible.
> How can I write my own scheduler?
> 
> thanks
>

Re: Selecting a task for the tasktracker

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

On top of that, the message indicates that you need to have your scheduler class in the mapred package.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote:

> Hi,
> 
> Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and trunk, the Mapreduce framework is completely revamped to Yarn (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) and you may need to look at different interfaces for building your own scheduler.
> 
> In 1.0, the primary function of the TaskScheduler is the assignTasks method. Given a TaskTracker object as input, this method figures out how many free map and reduce slots exist in that particular tasktracker and selects one or more task that can be scheduled on it. Since task selection is the primary responsibility and the granularity is at a task level, the class is called TaskScheduler.
> 
> The method of choosing a job and then a task within the job is customised by the different schedulers already present in Hadoop. Also, the core logic of selecting a map task with data locality optimizations is not implemented in the schedulers per se, but they rely on the JobInProgress object in MapReduce framework for achieving the same.
> 
> To implement your own Scheduler, it may be best to look at the sources of existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or FairScheduler.  In particular, the last two are in the contrib modules of mapreduce, and hence will be fairly independent to follow. Their build files will also tell you how to resolve any compile problems like the one you are facing.
> 
> Thanks
> Hemanth  
> 
> 
> 
> 
> On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com> wrote:
> Hi,
> If I understand correctly, the job scheduler (why is the class called TaskScheduler?) is responsible for assigning the task whose split is as close as possible to the tasktacker.
> Meaning that the job scheduler is responsible to two things:
> Selecting a job.
> Once a job is selected, assign the closest task to the tasktracker that send the heartbeat.
> Is this correct?
> 
> I want to write my own job scheduler to change the logic above, but it says The type TaskScheduler is not visible.
> How can I write my own scheduler?
> 
> thanks
>

Re: Selecting a task for the tasktracker

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
trunk, the Mapreduce framework is completely revamped to Yarn (
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
and you may need to look at different interfaces for building your own
scheduler.

In 1.0, the primary function of the TaskScheduler is the assignTasks
method. Given a TaskTracker object as input, this method figures out how
many free map and reduce slots exist in that particular tasktracker and
selects one or more task that can be scheduled on it. Since task selection
is the primary responsibility and the granularity is at a task level, the
class is called TaskScheduler.

The method of choosing a job and then a task within the job is customised
by the different schedulers already present in Hadoop. Also, the core logic
of selecting a map task with data locality optimizations is not implemented
in the schedulers per se, but they rely on the JobInProgress object in
MapReduce framework for achieving the same.

To implement your own Scheduler, it may be best to look at the sources of
existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
FairScheduler.  In particular, the last two are in the contrib modules of
mapreduce, and hence will be fairly independent to follow. Their build
files will also tell you how to resolve any compile problems like the one
you are facing.

Thanks
Hemanth

On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com> wrote:

> Hi,
> If I understand correctly, the job scheduler (why is the class called
> TaskScheduler?) is responsible for assigning the task whose split is as
> close as possible to the tasktacker.
> Meaning that the job scheduler is responsible to two things:
>
>    1. Selecting a job.
>    2. Once a job is selected, assign the closest task to the tasktracker
>    that send the heartbeat.
>
> Is this correct?
>
> I want to write my own job scheduler to change the logic above, but it
> says The type TaskScheduler is not visible.
> How can I write my own scheduler?
>
> thanks
>

Re: Selecting a task for the tasktracker

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
trunk, the Mapreduce framework is completely revamped to Yarn (
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
and you may need to look at different interfaces for building your own
scheduler.

In 1.0, the primary function of the TaskScheduler is the assignTasks
method. Given a TaskTracker object as input, this method figures out how
many free map and reduce slots exist in that particular tasktracker and
selects one or more task that can be scheduled on it. Since task selection
is the primary responsibility and the granularity is at a task level, the
class is called TaskScheduler.

The method of choosing a job and then a task within the job is customised
by the different schedulers already present in Hadoop. Also, the core logic
of selecting a map task with data locality optimizations is not implemented
in the schedulers per se, but they rely on the JobInProgress object in
MapReduce framework for achieving the same.

To implement your own Scheduler, it may be best to look at the sources of
existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
FairScheduler.  In particular, the last two are in the contrib modules of
mapreduce, and hence will be fairly independent to follow. Their build
files will also tell you how to resolve any compile problems like the one
you are facing.

Thanks
Hemanth

On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com> wrote:

> Hi,
> If I understand correctly, the job scheduler (why is the class called
> TaskScheduler?) is responsible for assigning the task whose split is as
> close as possible to the tasktacker.
> Meaning that the job scheduler is responsible to two things:
>
>    1. Selecting a job.
>    2. Once a job is selected, assign the closest task to the tasktracker
>    that send the heartbeat.
>
> Is this correct?
>
> I want to write my own job scheduler to change the logic above, but it
> says The type TaskScheduler is not visible.
> How can I write my own scheduler?
>
> thanks
>

Re: Selecting a task for the tasktracker

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
trunk, the Mapreduce framework is completely revamped to Yarn (
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
and you may need to look at different interfaces for building your own
scheduler.

In 1.0, the primary function of the TaskScheduler is the assignTasks
method. Given a TaskTracker object as input, this method figures out how
many free map and reduce slots exist in that particular tasktracker and
selects one or more task that can be scheduled on it. Since task selection
is the primary responsibility and the granularity is at a task level, the
class is called TaskScheduler.

The method of choosing a job and then a task within the job is customised
by the different schedulers already present in Hadoop. Also, the core logic
of selecting a map task with data locality optimizations is not implemented
in the schedulers per se, but they rely on the JobInProgress object in
MapReduce framework for achieving the same.

To implement your own Scheduler, it may be best to look at the sources of
existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
FairScheduler.  In particular, the last two are in the contrib modules of
mapreduce, and hence will be fairly independent to follow. Their build
files will also tell you how to resolve any compile problems like the one
you are facing.

Thanks
Hemanth

On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com> wrote:

> Hi,
> If I understand correctly, the job scheduler (why is the class called
> TaskScheduler?) is responsible for assigning the task whose split is as
> close as possible to the tasktacker.
> Meaning that the job scheduler is responsible to two things:
>
>    1. Selecting a job.
>    2. Once a job is selected, assign the closest task to the tasktracker
>    that send the heartbeat.
>
> Is this correct?
>
> I want to write my own job scheduler to change the logic above, but it
> says The type TaskScheduler is not visible.
> How can I write my own scheduler?
>
> thanks
>

Re: Selecting a task for the tasktracker

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
trunk, the Mapreduce framework is completely revamped to Yarn (
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
and you may need to look at different interfaces for building your own
scheduler.

In 1.0, the primary function of the TaskScheduler is the assignTasks
method. Given a TaskTracker object as input, this method figures out how
many free map and reduce slots exist in that particular tasktracker and
selects one or more task that can be scheduled on it. Since task selection
is the primary responsibility and the granularity is at a task level, the
class is called TaskScheduler.

The method of choosing a job and then a task within the job is customised
by the different schedulers already present in Hadoop. Also, the core logic
of selecting a map task with data locality optimizations is not implemented
in the schedulers per se, but they rely on the JobInProgress object in
MapReduce framework for achieving the same.

To implement your own Scheduler, it may be best to look at the sources of
existing schedulers: JobQueueTaskScheduler, CapacityTaskScheduler or
FairScheduler.  In particular, the last two are in the contrib modules of
mapreduce, and hence will be fairly independent to follow. Their build
files will also tell you how to resolve any compile problems like the one
you are facing.

Thanks
Hemanth

On Thu, Dec 27, 2012 at 4:10 PM, Yaron Gonen <ya...@gmail.com> wrote:

> Hi,
> If I understand correctly, the job scheduler (why is the class called
> TaskScheduler?) is responsible for assigning the task whose split is as
> close as possible to the tasktacker.
> Meaning that the job scheduler is responsible to two things:
>
>    1. Selecting a job.
>    2. Once a job is selected, assign the closest task to the tasktracker
>    that send the heartbeat.
>
> Is this correct?
>
> I want to write my own job scheduler to change the logic above, but it
> says The type TaskScheduler is not visible.
> How can I write my own scheduler?
>
> thanks
>