You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by jaylac <Ja...@cognizant.com> on 2007/03/02 05:24:44 UTC

MapReduce

Hi

I was just going thro abt MapReduce for my final year project work.....

I got confused in the middle.... What i thought is "MapReduce deals greatly
with key/value pairs only... For fitting a problem into mapreduce we should
find the key/value pairs"

I want to know whether im right or wrong....

I got confused after looking at the explanation in wikipedia... The
following is the content in wikipedia abt mapreduce...

========================================================================================
"A map function iterates over a list of independent elements and performs a
specified operation on each element. The list of answers is stored
independently from the original list. Because each element is operated on
independently and the original list is not being modified, it is very easy
to perform a map operation in parallel. On appropriate hardware this allows
extremely large data sets to be processed in short amounts of elapsed time.

For example consider a list of test scores where each score has been found
to be 1 too high. A map function of s − 1 could be applied to correct every
score s.

A reduce operation takes a list and combines elements according to some
algorithm. Since a reduce always ends up with a single answer, it is not as
parallelizable as a map function, but the large number of relatively
independent calculations means that reduce functions are still useful in
highly parallel environments.

Continuing the previous example, what if one wanted to know the average of
the test scores? One could define a reduce function which halved the size of
the list by adding an entry in the list to its neighbor, recursively
continuing until there is only one (large) entry, and dividing the total sum
by the original number of elements to get the average."

=========================================================================================

Here in map function we are simply adding up the test scores.... we are not
using any key/value pair..... Im totally confused....

I might be wrong at any point... please someone help me out..... Am i wrong
in the basic understanding of MapReduce itself..... Ill be thankful if
anyone explains me clearly...

please help me out to successfully complete my final year project....

Jaya

-- 
View this message in context: http://www.nabble.com/MapReduce-tf3331603.html#a9263847
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: MapReduce

Posted by jaylac <Ja...@cognizant.com>.

Thanks... Ill tell this to my Project Manager and let u know what he says...

Can u tell me the advantages of MapReduce.... 

"The time is saved because we use the key/value pairs... So key/value pairs
is the key factor for making MapReduce to be very advantageous ans
popular....."  This is what i think......Am i right?

Regards,
Jaya

Technically map reduce requires key/value pairs.  Hadoop's implementation
also requires them.  So if you want to run a map reduce job, you will need
to fit your data to key/value pairs.  Of course, as I have shown, you can
just use a meaningless key or value, but they are still required.

On 3/2/07, jaylac <Ja...@cognizant.com> wrote:
>
>
> Is that necessary to find the key/value pairs for fitting a problem to
> mapreduce..... If we dont use key/value pairs, shouldn't we call it as
> MapReduce?
>
> Coz my project manager has proposed an idea to fit our problem into
> mapreduce... in that there is no key/value pairs... but he is telling that
> we can have MapReduce without key/value pairs....
-- 
View this message in context: http://www.nabble.com/MapReduce-tf3331603.html#a9282014
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: MapReduce

Posted by Albert Chern <al...@gmail.com>.

Technically map reduce requires key/value pairs.  Hadoop's implementation
also requires them.  So if you want to run a map reduce job, you will need
to fit your data to key/value pairs.  Of course, as I have shown, you can
just use a meaningless key or value, but they are still required.

On 3/2/07, jaylac <Ja...@cognizant.com> wrote:
>
>
> Is that necessary to find the key/value pairs for fitting a problem to
> mapreduce..... If we dont use key/value pairs, shouldn't we call it as
> MapReduce?
>
> Coz my project manager has proposed an idea to fit our problem into
> mapreduce... in that there is no key/value pairs... but he is telling that
> we can have MapReduce without key/value pairs....
>
>
>
> Albert Chern wrote:
> >
> > Sometimes you need to do a little work to fit a problem into map reduce.
> > You are correct; in this problem, there really are no key/value pairs,
> so
> > you would use a dummy value.  For example, we could just use 0 as a key,
> > so
> > our test scores are:
> >
> > (0, 95)
> > (0,100)
> > (0, 70)
> > and so on...
> >
> > Each map gets one of these and subtracts one from the score, giving us:
> >
> > (0, 94)
> > (0, 99)
> > (0, 69)
> > and so on...
> >
> > There will be a reduce for each key, but we only have one key, so there
> > will
> > be one reduce that gets:
> >
> > (0, [94,99,69,...])
> >
> > The Wikipedia example isn't very good, but we can make it better by
> > dividing
> > the scores into scores for different subjects where we want to find the
> > average for each subject.  We might have:
> >
> > (Biology, 100)
> > (Biology, 95)
> > (Biology, 90)
> > and so on...
> >
> > (Chemistry, 90)
> > (Chemistry, 85)
> > (Chemistry, 80)
> > and so on...
> >
> > After you subtract one from each of these key/value pairs, there will be
> a
> > reduce for each key, which are the different subjects.  So you will have
> > one
> > reduce for each subject:
> >
> > (Biology, [99,94,89,...])
> > (Chemistry, [89,84,79,...])
> > and so on...
> >
> > One more thing: the Wikipedia example says that each reduce outputs one
> > value.  This isn't a requirement for Hadoop map reduce.
> >
> > On 3/1/07, jaylac <Ja...@cognizant.com> wrote:
> >>
> >>
> >> Hi
> >>
> >> I was just going thro abt MapReduce for my final year project work.....
> >>
> >> I got confused in the middle.... What i thought is "MapReduce deals
> >> greatly
> >> with key/value pairs only... For fitting a problem into mapreduce we
> >> should
> >> find the key/value pairs"
> >>
> >> I want to know whether im right or wrong....
> >>
> >> I got confused after looking at the explanation in wikipedia... The
> >> following is the content in wikipedia abt mapreduce...
> >>
> >>
> >>
> ========================================================================================
> >> "A map function iterates over a list of independent elements and
> performs
> >> a
> >> specified operation on each element. The list of answers is stored
> >> independently from the original list. Because each element is operated
> on
> >> independently and the original list is not being modified, it is very
> >> easy
> >> to perform a map operation in parallel. On appropriate hardware this
> >> allows
> >> extremely large data sets to be processed in short amounts of elapsed
> >> time.
> >>
> >> For example consider a list of test scores where each score has been
> >> found
> >> to be 1 too high. A map function of s − 1 could be applied to correct
> >> every
> >> score s.
> >>
> >> A reduce operation takes a list and combines elements according to some
> >> algorithm. Since a reduce always ends up with a single answer, it is
> not
> >> as
> >> parallelizable as a map function, but the large number of relatively
> >> independent calculations means that reduce functions are still useful
> in
> >> highly parallel environments.
> >>
> >> Continuing the previous example, what if one wanted to know the average
> >> of
> >> the test scores? One could define a reduce function which halved the
> size
> >> of
> >> the list by adding an entry in the list to its neighbor, recursively
> >> continuing until there is only one (large) entry, and dividing the
> total
> >> sum
> >> by the original number of elements to get the average."
> >>
> >>
> >>
> =========================================================================================
> >>
> >> Here in map function we are simply adding up the test scores.... we are
> >> not
> >> using any key/value pair..... Im totally confused....
> >>
> >> I might be wrong at any point... please someone help me out..... Am i
> >> wrong
> >> in the basic understanding of MapReduce itself..... Ill be thankful if
> >> anyone explains me clearly...
> >>
> >> please help me out to successfully complete my final year project....
> >>
> >> Jaya
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/MapReduce-tf3331603.html#a9263847
> >> Sent from the Hadoop Users mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/MapReduce-tf3331603.html#a9273832
> Sent from the Hadoop Users mailing list archive at Nabble.com.
>
>

Re: MapReduce

Posted by jaylac <Ja...@cognizant.com>.

Is that necessary to find the key/value pairs for fitting a problem to
mapreduce..... If we dont use key/value pairs, shouldn't we call it as
MapReduce?

Coz my project manager has proposed an idea to fit our problem into
mapreduce... in that there is no key/value pairs... but he is telling that
we can have MapReduce without key/value pairs....



Albert Chern wrote:
> 
> Sometimes you need to do a little work to fit a problem into map reduce.
> You are correct; in this problem, there really are no key/value pairs, so
> you would use a dummy value.  For example, we could just use 0 as a key,
> so
> our test scores are:
> 
> (0, 95)
> (0,100)
> (0, 70)
> and so on...
> 
> Each map gets one of these and subtracts one from the score, giving us:
> 
> (0, 94)
> (0, 99)
> (0, 69)
> and so on...
> 
> There will be a reduce for each key, but we only have one key, so there
> will
> be one reduce that gets:
> 
> (0, [94,99,69,...])
> 
> The Wikipedia example isn't very good, but we can make it better by
> dividing
> the scores into scores for different subjects where we want to find the
> average for each subject.  We might have:
> 
> (Biology, 100)
> (Biology, 95)
> (Biology, 90)
> and so on...
> 
> (Chemistry, 90)
> (Chemistry, 85)
> (Chemistry, 80)
> and so on...
> 
> After you subtract one from each of these key/value pairs, there will be a
> reduce for each key, which are the different subjects.  So you will have
> one
> reduce for each subject:
> 
> (Biology, [99,94,89,...])
> (Chemistry, [89,84,79,...])
> and so on...
> 
> One more thing: the Wikipedia example says that each reduce outputs one
> value.  This isn't a requirement for Hadoop map reduce.
> 
> On 3/1/07, jaylac <Ja...@cognizant.com> wrote:
>>
>>
>> Hi
>>
>> I was just going thro abt MapReduce for my final year project work.....
>>
>> I got confused in the middle.... What i thought is "MapReduce deals
>> greatly
>> with key/value pairs only... For fitting a problem into mapreduce we
>> should
>> find the key/value pairs"
>>
>> I want to know whether im right or wrong....
>>
>> I got confused after looking at the explanation in wikipedia... The
>> following is the content in wikipedia abt mapreduce...
>>
>>
>> ========================================================================================
>> "A map function iterates over a list of independent elements and performs
>> a
>> specified operation on each element. The list of answers is stored
>> independently from the original list. Because each element is operated on
>> independently and the original list is not being modified, it is very
>> easy
>> to perform a map operation in parallel. On appropriate hardware this
>> allows
>> extremely large data sets to be processed in short amounts of elapsed
>> time.
>>
>> For example consider a list of test scores where each score has been
>> found
>> to be 1 too high. A map function of s − 1 could be applied to correct
>> every
>> score s.
>>
>> A reduce operation takes a list and combines elements according to some
>> algorithm. Since a reduce always ends up with a single answer, it is not
>> as
>> parallelizable as a map function, but the large number of relatively
>> independent calculations means that reduce functions are still useful in
>> highly parallel environments.
>>
>> Continuing the previous example, what if one wanted to know the average
>> of
>> the test scores? One could define a reduce function which halved the size
>> of
>> the list by adding an entry in the list to its neighbor, recursively
>> continuing until there is only one (large) entry, and dividing the total
>> sum
>> by the original number of elements to get the average."
>>
>>
>> =========================================================================================
>>
>> Here in map function we are simply adding up the test scores.... we are
>> not
>> using any key/value pair..... Im totally confused....
>>
>> I might be wrong at any point... please someone help me out..... Am i
>> wrong
>> in the basic understanding of MapReduce itself..... Ill be thankful if
>> anyone explains me clearly...
>>
>> please help me out to successfully complete my final year project....
>>
>> Jaya
>>
>> --
>> View this message in context:
>> http://www.nabble.com/MapReduce-tf3331603.html#a9263847
>> Sent from the Hadoop Users mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/MapReduce-tf3331603.html#a9273832
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: MapReduce

Posted by Albert Chern <al...@gmail.com>.

Sometimes you need to do a little work to fit a problem into map reduce.
You are correct; in this problem, there really are no key/value pairs, so
you would use a dummy value.  For example, we could just use 0 as a key, so
our test scores are:

(0, 95)
(0,100)
(0, 70)
and so on...

Each map gets one of these and subtracts one from the score, giving us:

(0, 94)
(0, 99)
(0, 69)
and so on...

There will be a reduce for each key, but we only have one key, so there will
be one reduce that gets:

(0, [94,99,69,...])

The Wikipedia example isn't very good, but we can make it better by dividing
the scores into scores for different subjects where we want to find the
average for each subject.  We might have:

(Biology, 100)
(Biology, 95)
(Biology, 90)
and so on...

(Chemistry, 90)
(Chemistry, 85)
(Chemistry, 80)
and so on...

After you subtract one from each of these key/value pairs, there will be a
reduce for each key, which are the different subjects.  So you will have one
reduce for each subject:

(Biology, [99,94,89,...])
(Chemistry, [89,84,79,...])
and so on...

One more thing: the Wikipedia example says that each reduce outputs one
value.  This isn't a requirement for Hadoop map reduce.

On 3/1/07, jaylac <Ja...@cognizant.com> wrote:
>
>
> Hi
>
> I was just going thro abt MapReduce for my final year project work.....
>
> I got confused in the middle.... What i thought is "MapReduce deals
> greatly
> with key/value pairs only... For fitting a problem into mapreduce we
> should
> find the key/value pairs"
>
> I want to know whether im right or wrong....
>
> I got confused after looking at the explanation in wikipedia... The
> following is the content in wikipedia abt mapreduce...
>
>
> ========================================================================================
> "A map function iterates over a list of independent elements and performs
> a
> specified operation on each element. The list of answers is stored
> independently from the original list. Because each element is operated on
> independently and the original list is not being modified, it is very easy
> to perform a map operation in parallel. On appropriate hardware this
> allows
> extremely large data sets to be processed in short amounts of elapsed
> time.
>
> For example consider a list of test scores where each score has been found
> to be 1 too high. A map function of s − 1 could be applied to correct
> every
> score s.
>
> A reduce operation takes a list and combines elements according to some
> algorithm. Since a reduce always ends up with a single answer, it is not
> as
> parallelizable as a map function, but the large number of relatively
> independent calculations means that reduce functions are still useful in
> highly parallel environments.
>
> Continuing the previous example, what if one wanted to know the average of
> the test scores? One could define a reduce function which halved the size
> of
> the list by adding an entry in the list to its neighbor, recursively
> continuing until there is only one (large) entry, and dividing the total
> sum
> by the original number of elements to get the average."
>
>
> =========================================================================================
>
> Here in map function we are simply adding up the test scores.... we are
> not
> using any key/value pair..... Im totally confused....
>
> I might be wrong at any point... please someone help me out..... Am i
> wrong
> in the basic understanding of MapReduce itself..... Ill be thankful if
> anyone explains me clearly...
>
> please help me out to successfully complete my final year project....
>
> Jaya
>
> --
> View this message in context:
> http://www.nabble.com/MapReduce-tf3331603.html#a9263847
> Sent from the Hadoop Users mailing list archive at Nabble.com.
>
>