You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by bikash sharma <sh...@gmail.com> on 2011/02/25 14:52:53 UTC

definition of slots in Hadoop scheduling

Hi,
How is task slot in Hadoop defined with respect to scheduling a map/reduce
task on such slots available on TaskTrackers?

Thanks,
Bikash

Re: definition of slots in Hadoop scheduling

Posted by bikash sharma <sh...@gmail.com>.

Thanks Allen.

On Sat, Mar 12, 2011 at 11:34 AM, Allen Wittenauer <aw...@apache.org> wrote:

>
> (Removing common-dev, because this isn't really a dev question)
>
> On Feb 25, 2011, at 5:52 AM, bikash sharma wrote:
>
> > Hi,
> > How is task slot in Hadoop defined with respect to scheduling a
> map/reduce
> > task on such slots available on TaskTrackers?
>
>
>         On a TaskTracker, one sets how many maps and reduces one wants to
> run on that node.  The JobTracker is informed of this value.  When a job is
> getting scheduled, it compares the various tasks's input to see if a
> DataNode is providing a matching block.  If a block exists or is nearby, the
> task is scheduled on that node.
>
>
>

Re: definition of slots in Hadoop scheduling

Posted by Allen Wittenauer <aw...@apache.org>.

(Removing common-dev, because this isn't really a dev question)

On Feb 25, 2011, at 5:52 AM, bikash sharma wrote:

> Hi,
> How is task slot in Hadoop defined with respect to scheduling a map/reduce
> task on such slots available on TaskTrackers?


	On a TaskTracker, one sets how many maps and reduces one wants to run on that node.  The JobTracker is informed of this value.  When a job is getting scheduled, it compares the various tasks's input to see if a DataNode is providing a matching block.  If a block exists or is nearby, the task is scheduled on that node.

Re: definition of slots in Hadoop scheduling

Posted by bikash sharma <sh...@gmail.com>.

Thanks very much Harsh. It seems then that slots are not defined in terms of
actual machine resource capacities in terms of cpu, memory, disk and network
bandwidth.

-bikash

On Fri, Feb 25, 2011 at 11:33 AM, Harsh J <qw...@gmail.com> wrote:

> Please see this archived thread for a very similar question on what
> tasks really are:
>
> http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3C126335.8536.qm@web112111.mail.gq1.yahoo.com%3E
>
> Right now, they're just a cap number for parallelization,
> hand-configured and irrespective of the machine's capabilities.
> However, a Scheduler may take machine's states into account while
> assigning tasks to one.
>
> On Fri, Feb 25, 2011 at 7:22 PM, bikash sharma <sh...@gmail.com>
> wrote:
> > Hi,
> > How is task slot in Hadoop defined with respect to scheduling a
> map/reduce
> > task on such slots available on TaskTrackers?
> >
> > Thanks,
> > Bikash
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>

Re: definition of slots in Hadoop scheduling

Posted by Harsh J <qw...@gmail.com>.

Please see this archived thread for a very similar question on what
tasks really are:
http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3C126335.8536.qm@web112111.mail.gq1.yahoo.com%3E

Right now, they're just a cap number for parallelization,
hand-configured and irrespective of the machine's capabilities.
However, a Scheduler may take machine's states into account while
assigning tasks to one.

On Fri, Feb 25, 2011 at 7:22 PM, bikash sharma <sh...@gmail.com> wrote:
> Hi,
> How is task slot in Hadoop defined with respect to scheduling a map/reduce
> task on such slots available on TaskTrackers?
>
> Thanks,
> Bikash
>

-- 
Harsh J
www.harshj.com

Re: definition of slots in Hadoop scheduling

Posted by Allen Wittenauer <aw...@apache.org>.

(Removing common-dev, because this isn't really a dev question)

On Feb 25, 2011, at 5:52 AM, bikash sharma wrote:

> Hi,
> How is task slot in Hadoop defined with respect to scheduling a map/reduce
> task on such slots available on TaskTrackers?


	On a TaskTracker, one sets how many maps and reduces one wants to run on that node.  The JobTracker is informed of this value.  When a job is getting scheduled, it compares the various tasks's input to see if a DataNode is providing a matching block.  If a block exists or is nearby, the task is scheduled on that node.