You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/01/25 08:46:00 UTC

mappers-node relationship

Hi.
  A very very lame question.
Does numbers of mapper depends on the number of nodes I have?
How I imagine map-reduce is this.
For example in word count example
I have bunch of slave nodes.
The documents are distributed across these slave nodes.
Now depending on how big the data is, it will spread across the slave
nodes.. and that is how my number of mappers are decided.
I am sure, this is wrong understanding. As in pseudo-distributed node, i
can see multiple mappers.
So question is.. how does a single node machine runs multiple mappers? is
it run in parallel or sequentially??
Any resources to learn these
Thanks

Re: mappers-node relationship

Posted by Mahesh Balija <ba...@gmail.com>.
Mappers and Reducers will run in Task instances mapper/reducer instances
also called as mapper/reducer slots.
Each node can have multiple slots (I mean multiple mapper instances, each
run in a child JVM). And this is configurable with properties like
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.
Also they run in parallel.

Best,
Mahesh Balija,
CalsoftLabs.



On Fri, Jan 25, 2013 at 1:16 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi.
>   A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is this.
> For example in word count example
> I have bunch of slave nodes.
> The documents are distributed across these slave nodes.
> Now depending on how big the data is, it will spread across the slave
> nodes.. and that is how my number of mappers are decided.
> I am sure, this is wrong understanding. As in pseudo-distributed node, i
> can see multiple mappers.
> So question is.. how does a single node machine runs multiple mappers? is
> it run in parallel or sequentially??
> Any resources to learn these
> Thanks
>

Re: mappers-node relationship

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
This may beof some use, about how maps are decided:

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Thanks
Hemanth

On Friday, January 25, 2013, jamal sasha wrote:

> Hi.
>   A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is this.
> For example in word count example
> I have bunch of slave nodes.
> The documents are distributed across these slave nodes.
> Now depending on how big the data is, it will spread across the slave
> nodes.. and that is how my number of mappers are decided.
> I am sure, this is wrong understanding. As in pseudo-distributed node, i
> can see multiple mappers.
> So question is.. how does a single node machine runs multiple mappers? is
> it run in parallel or sequentially??
> Any resources to learn these
> Thanks
>

Re: mappers-node relationship

Posted by Mahesh Balija <ba...@gmail.com>.
Mappers and Reducers will run in Task instances mapper/reducer instances
also called as mapper/reducer slots.
Each node can have multiple slots (I mean multiple mapper instances, each
run in a child JVM). And this is configurable with properties like
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.
Also they run in parallel.

Best,
Mahesh Balija,
CalsoftLabs.



On Fri, Jan 25, 2013 at 1:16 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi.
>   A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is this.
> For example in word count example
> I have bunch of slave nodes.
> The documents are distributed across these slave nodes.
> Now depending on how big the data is, it will spread across the slave
> nodes.. and that is how my number of mappers are decided.
> I am sure, this is wrong understanding. As in pseudo-distributed node, i
> can see multiple mappers.
> So question is.. how does a single node machine runs multiple mappers? is
> it run in parallel or sequentially??
> Any resources to learn these
> Thanks
>

Re: mappers-node relationship

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
This may beof some use, about how maps are decided:

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Thanks
Hemanth

On Friday, January 25, 2013, jamal sasha wrote:

> Hi.
>   A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is this.
> For example in word count example
> I have bunch of slave nodes.
> The documents are distributed across these slave nodes.
> Now depending on how big the data is, it will spread across the slave
> nodes.. and that is how my number of mappers are decided.
> I am sure, this is wrong understanding. As in pseudo-distributed node, i
> can see multiple mappers.
> So question is.. how does a single node machine runs multiple mappers? is
> it run in parallel or sequentially??
> Any resources to learn these
> Thanks
>

Re: mappers-node relationship

Posted by Mahesh Balija <ba...@gmail.com>.
Mappers and Reducers will run in Task instances mapper/reducer instances
also called as mapper/reducer slots.
Each node can have multiple slots (I mean multiple mapper instances, each
run in a child JVM). And this is configurable with properties like
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.
Also they run in parallel.

Best,
Mahesh Balija,
CalsoftLabs.



On Fri, Jan 25, 2013 at 1:16 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi.
>   A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is this.
> For example in word count example
> I have bunch of slave nodes.
> The documents are distributed across these slave nodes.
> Now depending on how big the data is, it will spread across the slave
> nodes.. and that is how my number of mappers are decided.
> I am sure, this is wrong understanding. As in pseudo-distributed node, i
> can see multiple mappers.
> So question is.. how does a single node machine runs multiple mappers? is
> it run in parallel or sequentially??
> Any resources to learn these
> Thanks
>

Re: mappers-node relationship

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
This may beof some use, about how maps are decided:

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Thanks
Hemanth

On Friday, January 25, 2013, jamal sasha wrote:

> Hi.
>   A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is this.
> For example in word count example
> I have bunch of slave nodes.
> The documents are distributed across these slave nodes.
> Now depending on how big the data is, it will spread across the slave
> nodes.. and that is how my number of mappers are decided.
> I am sure, this is wrong understanding. As in pseudo-distributed node, i
> can see multiple mappers.
> So question is.. how does a single node machine runs multiple mappers? is
> it run in parallel or sequentially??
> Any resources to learn these
> Thanks
>

Re: mappers-node relationship

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
This may beof some use, about how maps are decided:

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Thanks
Hemanth

On Friday, January 25, 2013, jamal sasha wrote:

> Hi.
>   A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is this.
> For example in word count example
> I have bunch of slave nodes.
> The documents are distributed across these slave nodes.
> Now depending on how big the data is, it will spread across the slave
> nodes.. and that is how my number of mappers are decided.
> I am sure, this is wrong understanding. As in pseudo-distributed node, i
> can see multiple mappers.
> So question is.. how does a single node machine runs multiple mappers? is
> it run in parallel or sequentially??
> Any resources to learn these
> Thanks
>

Re: mappers-node relationship

Posted by Mahesh Balija <ba...@gmail.com>.
Mappers and Reducers will run in Task instances mapper/reducer instances
also called as mapper/reducer slots.
Each node can have multiple slots (I mean multiple mapper instances, each
run in a child JVM). And this is configurable with properties like
mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.
Also they run in parallel.

Best,
Mahesh Balija,
CalsoftLabs.



On Fri, Jan 25, 2013 at 1:16 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi.
>   A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is this.
> For example in word count example
> I have bunch of slave nodes.
> The documents are distributed across these slave nodes.
> Now depending on how big the data is, it will spread across the slave
> nodes.. and that is how my number of mappers are decided.
> I am sure, this is wrong understanding. As in pseudo-distributed node, i
> can see multiple mappers.
> So question is.. how does a single node machine runs multiple mappers? is
> it run in parallel or sequentially??
> Any resources to learn these
> Thanks
>