You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2012/11/21 20:50:51 UTC

fundamental doubt

Hi..
I guess i am asking alot of fundamental questions but i thank you guys for
taking out time to explain my doubts.
So i am able to write map reduce jobs but here is my mydoubt
As of now i am writing mappers which emit key and a value
This key value is then captured at reducer end and then i process the key
and value there.
Let's say i want to calculate the average...
Key1 value1
Key2 value 2
Key 1 value 3

So the output is something like
Key1 average of value  1 and value 3
Key2 average 2 = value 2

Right now in reducer i have to create a dictionary with key as original
keys and value is a list.
Data = defaultdict(list) == // python usrr
But i thought that
Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
Reducer takes in this key and list of values and returns
Key , new value..

So why is the input of reducer the simple output of mapper and not the list
of all the values to a particular key or did i  understood something.
Am i making any sense ??

Re: fundamental doubt

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Jamal,

     For efficient processing all the values associated with the same key
get sorted and go to same reducer. As a result the reducer gets a key and a
list of values as its input. To me your assumption seems correct.

Regards,
    Mohammad Tariq



On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??

Re: fundamental doubt

Posted by jamal sasha <ja...@gmail.com>.
got it.
thanks for clarification


On Wed, Nov 21, 2012 at 3:03 PM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi Jamal
>
> It is performed at a frame work level map emits key value pairs and the
> framework collects and groups all the values corresponding to a key from
> all the map tasks. Now the reducer takes the input as a key and a
> collection of values only. The reduce method signature defines it.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * jamal sasha <ja...@gmail.com>
> *Date: *Wed, 21 Nov 2012 14:50:51 -0500
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *fundamental doubt
>
> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??
>

Re: fundamental doubt

Posted by jamal sasha <ja...@gmail.com>.
got it.
thanks for clarification


On Wed, Nov 21, 2012 at 3:03 PM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi Jamal
>
> It is performed at a frame work level map emits key value pairs and the
> framework collects and groups all the values corresponding to a key from
> all the map tasks. Now the reducer takes the input as a key and a
> collection of values only. The reduce method signature defines it.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * jamal sasha <ja...@gmail.com>
> *Date: *Wed, 21 Nov 2012 14:50:51 -0500
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *fundamental doubt
>
> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??
>

Re: fundamental doubt

Posted by jamal sasha <ja...@gmail.com>.
got it.
thanks for clarification


On Wed, Nov 21, 2012 at 3:03 PM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi Jamal
>
> It is performed at a frame work level map emits key value pairs and the
> framework collects and groups all the values corresponding to a key from
> all the map tasks. Now the reducer takes the input as a key and a
> collection of values only. The reduce method signature defines it.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * jamal sasha <ja...@gmail.com>
> *Date: *Wed, 21 Nov 2012 14:50:51 -0500
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *fundamental doubt
>
> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??
>

Re: fundamental doubt

Posted by jamal sasha <ja...@gmail.com>.
got it.
thanks for clarification


On Wed, Nov 21, 2012 at 3:03 PM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi Jamal
>
> It is performed at a frame work level map emits key value pairs and the
> framework collects and groups all the values corresponding to a key from
> all the map tasks. Now the reducer takes the input as a key and a
> collection of values only. The reduce method signature defines it.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * jamal sasha <ja...@gmail.com>
> *Date: *Wed, 21 Nov 2012 14:50:51 -0500
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *fundamental doubt
>
> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??
>

Re: fundamental doubt

Posted by Bejoy KS <be...@gmail.com>.
Hi Jamal

It is performed at a frame work level map emits key value pairs and the framework collects and groups all the values corresponding to a key from all the map tasks. Now the reducer takes the input as a key and a collection of values only. The reduce method signature defines it.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: jamal sasha <ja...@gmail.com>
Date: Wed, 21 Nov 2012 14:50:51 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: fundamental doubt

Hi..
I guess i am asking alot of fundamental questions but i thank you guys for
taking out time to explain my doubts.
So i am able to write map reduce jobs but here is my mydoubt
As of now i am writing mappers which emit key and a value
This key value is then captured at reducer end and then i process the key
and value there.
Let's say i want to calculate the average...
Key1 value1
Key2 value 2
Key 1 value 3

So the output is something like
Key1 average of value  1 and value 3
Key2 average 2 = value 2

Right now in reducer i have to create a dictionary with key as original
keys and value is a list.
Data = defaultdict(list) == // python usrr
But i thought that
Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
Reducer takes in this key and list of values and returns
Key , new value..

So why is the input of reducer the simple output of mapper and not the list
of all the values to a particular key or did i  understood something.
Am i making any sense ??


Re: fundamental doubt

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Jamal,

     For efficient processing all the values associated with the same key
get sorted and go to same reducer. As a result the reducer gets a key and a
list of values as its input. To me your assumption seems correct.

Regards,
    Mohammad Tariq



On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??

Re: fundamental doubt

Posted by Bejoy KS <be...@gmail.com>.
Hi Jamal

It is performed at a frame work level map emits key value pairs and the framework collects and groups all the values corresponding to a key from all the map tasks. Now the reducer takes the input as a key and a collection of values only. The reduce method signature defines it.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: jamal sasha <ja...@gmail.com>
Date: Wed, 21 Nov 2012 14:50:51 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: fundamental doubt

Hi..
I guess i am asking alot of fundamental questions but i thank you guys for
taking out time to explain my doubts.
So i am able to write map reduce jobs but here is my mydoubt
As of now i am writing mappers which emit key and a value
This key value is then captured at reducer end and then i process the key
and value there.
Let's say i want to calculate the average...
Key1 value1
Key2 value 2
Key 1 value 3

So the output is something like
Key1 average of value  1 and value 3
Key2 average 2 = value 2

Right now in reducer i have to create a dictionary with key as original
keys and value is a list.
Data = defaultdict(list) == // python usrr
But i thought that
Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
Reducer takes in this key and list of values and returns
Key , new value..

So why is the input of reducer the simple output of mapper and not the list
of all the values to a particular key or did i  understood something.
Am i making any sense ??


Re: fundamental doubt

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Jamal,

     For efficient processing all the values associated with the same key
get sorted and go to same reducer. As a result the reducer gets a key and a
list of values as its input. To me your assumption seems correct.

Regards,
    Mohammad Tariq



On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??

Re: fundamental doubt

Posted by Bejoy KS <be...@gmail.com>.
Hi Jamal

It is performed at a frame work level map emits key value pairs and the framework collects and groups all the values corresponding to a key from all the map tasks. Now the reducer takes the input as a key and a collection of values only. The reduce method signature defines it.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: jamal sasha <ja...@gmail.com>
Date: Wed, 21 Nov 2012 14:50:51 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: fundamental doubt

Hi..
I guess i am asking alot of fundamental questions but i thank you guys for
taking out time to explain my doubts.
So i am able to write map reduce jobs but here is my mydoubt
As of now i am writing mappers which emit key and a value
This key value is then captured at reducer end and then i process the key
and value there.
Let's say i want to calculate the average...
Key1 value1
Key2 value 2
Key 1 value 3

So the output is something like
Key1 average of value  1 and value 3
Key2 average 2 = value 2

Right now in reducer i have to create a dictionary with key as original
keys and value is a list.
Data = defaultdict(list) == // python usrr
But i thought that
Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
Reducer takes in this key and list of values and returns
Key , new value..

So why is the input of reducer the simple output of mapper and not the list
of all the values to a particular key or did i  understood something.
Am i making any sense ??


Re: fundamental doubt

Posted by Bejoy KS <be...@gmail.com>.
Hi Jamal

It is performed at a frame work level map emits key value pairs and the framework collects and groups all the values corresponding to a key from all the map tasks. Now the reducer takes the input as a key and a collection of values only. The reduce method signature defines it.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: jamal sasha <ja...@gmail.com>
Date: Wed, 21 Nov 2012 14:50:51 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: fundamental doubt

Hi..
I guess i am asking alot of fundamental questions but i thank you guys for
taking out time to explain my doubts.
So i am able to write map reduce jobs but here is my mydoubt
As of now i am writing mappers which emit key and a value
This key value is then captured at reducer end and then i process the key
and value there.
Let's say i want to calculate the average...
Key1 value1
Key2 value 2
Key 1 value 3

So the output is something like
Key1 average of value  1 and value 3
Key2 average 2 = value 2

Right now in reducer i have to create a dictionary with key as original
keys and value is a list.
Data = defaultdict(list) == // python usrr
But i thought that
Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
Reducer takes in this key and list of values and returns
Key , new value..

So why is the input of reducer the simple output of mapper and not the list
of all the values to a particular key or did i  understood something.
Am i making any sense ??


Re: fundamental doubt

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Jamal,

     For efficient processing all the values associated with the same key
get sorted and go to same reducer. As a result the reducer gets a key and a
list of values as its input. To me your assumption seems correct.

Regards,
    Mohammad Tariq



On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??