You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Saurabh Agarwal <sr...@gmail.com> on 2010/05/13 21:15:09 UTC

Task scheduler

Hi,

i am experimenting with hadoop. wanted to ask that is the Task distribution
policy by job tracker pluggable if yes where in the code tree is it defined.


Thanks and regards
Saurabh Agarwal

RE: Task scheduler

Posted by "Segel, Mike" <ms...@navteq.com>.

+1
I agree with Steve that sometimes you need to redirect where you want the work to occur.

Over time, your cloud will not have homogenous data nodes. You may end up with a cluster of nodes that have a Fermi card (NVIDA CUDA enabled cards) where you want to do some serious number crunching. [ I don't believe if CUDA supports Java, but you get the idea...]

So in theory, you will want to dictate where the work will be performed.

-Mike

PS. Sorry if this is a bad example. I was talking with my Uncle over the weekend and he's in to some serious number crunching.

-----Original Message-----
From: Steve Loughran [mailto:stevel@apache.org] 
Sent: Monday, May 17, 2010 6:47 AM
To: common-dev@hadoop.apache.org
Subject: Re: Task scheduler

Saurabh Agarwal wrote:
> Hemanth,
> 
> 
> Thanks!!
> Saurabh Agarwal
> 
> 
> On Fri, May 14, 2010 at 9:49 AM, Hemanth Yamijala <yh...@gmail.com>wrote:
> 
>> Saurabh,
>>
>>>  let me re frame my question I wanted to knowhow job tracker decides the
>>> assignment of input splits to task tracker based on task tracker's data
>>> locality. Where is this policy defined? Is it pluggable?
>> Sorry, I misunderstood your question then. This code is in
>> o.a.h.mapred.JobInProgress. It is likely spread across many methods in
>> the class. But a good starting point could be from methods like
>> obtainNewMapTask or obtainNewReduceTask.
>>
>> At the moment, this policy is not pluggable. But I know there have
>> been discussions (possibly even a JIRA, though I can't locate any now)
>> asking for this capability.
>>

+1 to having some plugin interface in 0.22+ to give you control.

My fomer colleague russ perry did some rendering with Hadoop where he 
wanted the work done not where the input data was, but where the output 
data was needed; there was no way to do this
http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf

The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited.  If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.

Re: Task scheduler

Posted by Steve Loughran <st...@apache.org>.

Saurabh Agarwal wrote:
> Hemanth,
> 
> 
> Thanks!!
> Saurabh Agarwal
> 
> 
> On Fri, May 14, 2010 at 9:49 AM, Hemanth Yamijala <yh...@gmail.com>wrote:
> 
>> Saurabh,
>>
>>>  let me re frame my question I wanted to knowhow job tracker decides the
>>> assignment of input splits to task tracker based on task tracker's data
>>> locality. Where is this policy defined? Is it pluggable?
>> Sorry, I misunderstood your question then. This code is in
>> o.a.h.mapred.JobInProgress. It is likely spread across many methods in
>> the class. But a good starting point could be from methods like
>> obtainNewMapTask or obtainNewReduceTask.
>>
>> At the moment, this policy is not pluggable. But I know there have
>> been discussions (possibly even a JIRA, though I can't locate any now)
>> asking for this capability.
>>

+1 to having some plugin interface in 0.22+ to give you control.

My fomer colleague russ perry did some rendering with Hadoop where he 
wanted the work done not where the input data was, but where the output 
data was needed; there was no way to do this
http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf

Re: Task scheduler

Posted by Saurabh Agarwal <sr...@gmail.com>.

Hemanth,


Thanks!!
Saurabh Agarwal


On Fri, May 14, 2010 at 9:49 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Saurabh,
>
> >  let me re frame my question I wanted to knowhow job tracker decides the
> > assignment of input splits to task tracker based on task tracker's data
> > locality. Where is this policy defined? Is it pluggable?
>
> Sorry, I misunderstood your question then. This code is in
> o.a.h.mapred.JobInProgress. It is likely spread across many methods in
> the class. But a good starting point could be from methods like
> obtainNewMapTask or obtainNewReduceTask.
>
> At the moment, this policy is not pluggable. But I know there have
> been discussions (possibly even a JIRA, though I can't locate any now)
> asking for this capability.
>
> Thanks
> Hemanth
>
> >
> > On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala <yhemanth@gmail.com
> >wrote:
> >
> >> Saurabh,
> >>
> >> > i am experimenting with hadoop. wanted to ask that is the Task
> >> distribution
> >> > policy by job tracker pluggable if yes where in the code tree is it
> >> defined.
> >> >
> >>
> >> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
> >> that needs to be extended to define a new scheduling policy. Also,
> >> please do take a look at the existing schedulers that extend this
> >> class. There are 3-4 implementations including the default scheduler,
> >> capacity scheduler, fairshare scheduler and dynamic priority
> >> scheduler. It may be worthwhile to see if your ideas match any of the
> >> existing implementations to some degree and then consider enhancing
> >> those as a first option.
> >>
> >> Thanks
> >> Hemanth
> >>
> >
>

Re: Task scheduler

Posted by Hemanth Yamijala <yh...@gmail.com>.

Saurabh,

>  let me re frame my question I wanted to knowhow job tracker decides the
> assignment of input splits to task tracker based on task tracker's data
> locality. Where is this policy defined? Is it pluggable?

Sorry, I misunderstood your question then. This code is in
o.a.h.mapred.JobInProgress. It is likely spread across many methods in
the class. But a good starting point could be from methods like
obtainNewMapTask or obtainNewReduceTask.

At the moment, this policy is not pluggable. But I know there have
been discussions (possibly even a JIRA, though I can't locate any now)
asking for this capability.

Thanks
Hemanth

>
> On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala <yh...@gmail.com>wrote:
>
>> Saurabh,
>>
>> > i am experimenting with hadoop. wanted to ask that is the Task
>> distribution
>> > policy by job tracker pluggable if yes where in the code tree is it
>> defined.
>> >
>>
>> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
>> that needs to be extended to define a new scheduling policy. Also,
>> please do take a look at the existing schedulers that extend this
>> class. There are 3-4 implementations including the default scheduler,
>> capacity scheduler, fairshare scheduler and dynamic priority
>> scheduler. It may be worthwhile to see if your ideas match any of the
>> existing implementations to some degree and then consider enhancing
>> those as a first option.
>>
>> Thanks
>> Hemanth
>>
>

Re: Task scheduler

Posted by Saurabh Agarwal <sr...@gmail.com>.

Hi Hemanth,

 let me re frame my question I wanted to knowhow job tracker decides the
assignment of input splits to task tracker based on task tracker's data
locality. Where is this policy defined? Is it pluggable?
Saurabh Agarwal


On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Saurabh,
>
> > i am experimenting with hadoop. wanted to ask that is the Task
> distribution
> > policy by job tracker pluggable if yes where in the code tree is it
> defined.
> >
>
> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
> that needs to be extended to define a new scheduling policy. Also,
> please do take a look at the existing schedulers that extend this
> class. There are 3-4 implementations including the default scheduler,
> capacity scheduler, fairshare scheduler and dynamic priority
> scheduler. It may be worthwhile to see if your ideas match any of the
> existing implementations to some degree and then consider enhancing
> those as a first option.
>
> Thanks
> Hemanth
>

Re: Task scheduler

Posted by Hemanth Yamijala <yh...@gmail.com>.

Saurabh,

> i am experimenting with hadoop. wanted to ask that is the Task distribution
> policy by job tracker pluggable if yes where in the code tree is it defined.
>

Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
that needs to be extended to define a new scheduling policy. Also,
please do take a look at the existing schedulers that extend this
class. There are 3-4 implementations including the default scheduler,
capacity scheduler, fairshare scheduler and dynamic priority
scheduler. It may be worthwhile to see if your ideas match any of the
existing implementations to some degree and then consider enhancing
those as a first option.

Thanks
Hemanth