You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ma qiang <ma...@gmail.com> on 2008/02/23 08:34:08 UTC

how to use two reduce fucntions?

Hi all,
    I have a program need to use two reduce fucntions, who can tell me why?
    Thank you!

Qiang

Re: how to use two reduce fucntions?

Posted by ma qiang <ma...@gmail.com>.
Thanks for your reply. I meet this problem as below: I have a
application that need to use two reduce phase. In my first reduce
function, I divided all the data into several keys which will be use
in the second reduce function, in addition,in my second reduce
function, it will computer some values using data from the result of
the first reduce function . The result of the first reduce function is
the input data of the second reduce function.
Or I run two jobs, but in this case the map function of the second job
will do nothing except some IO .


On Sun, Feb 24, 2008 at 3:29 AM, Jason Venner <ja...@attributor.com> wrote:
> If you set up a partitioner class, you could pre partition the output of
>  the into the relevant segments.
>  Then your reducer would be responsible for determining which reduce
>  function to apply based on which segment the key is part of.
>
>
>
>
>  Amar Kamat wrote:
>  > Can you provide more details on what exactly what you wish to do? What
>  > is the nature of reducers? A simple answer would be with map(m) and
>  > reducers(r1,r2) you can run 2 jobs i.e job1(m,r1) and
>  > job2(IdentityMapper,r2). But it depends what exactly r1 and r2 do.
>  > Also combiners will play an important role. Also can one merge r1 and
>  > r2 to r and run a job(m,r)
>  > Amar
>  > On Sat, 23 Feb 2008, ma qiang wrote:
>  >
>  >> Hi all,
>  >>    I have a program need to use two reduce fucntions, who can tell me
>  >> why?
>  >>    Thank you!
>  >>
>  >> Qiang
>  >>
>
>  --
>  Jason Venner
>  Attributor - Publish with Confidence <http://www.attributor.com/>
>  Attributor is hiring Hadoop Wranglers, contact if interested
>

Re: how to use two reduce fucntions?

Posted by Jason Venner <ja...@attributor.com>.
If you set up a partitioner class, you could pre partition the output of 
the into the relevant segments.
Then your reducer would be responsible for determining which reduce 
function to apply based on which segment the key is part of.


Amar Kamat wrote:
> Can you provide more details on what exactly what you wish to do? What 
> is the nature of reducers? A simple answer would be with map(m) and 
> reducers(r1,r2) you can run 2 jobs i.e job1(m,r1) and 
> job2(IdentityMapper,r2). But it depends what exactly r1 and r2 do. 
> Also combiners will play an important role. Also can one merge r1 and 
> r2 to r and run a job(m,r)
> Amar
> On Sat, 23 Feb 2008, ma qiang wrote:
>
>> Hi all,
>>    I have a program need to use two reduce fucntions, who can tell me 
>> why?
>>    Thank you!
>>
>> Qiang
>>

-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: how to use two reduce fucntions?

Posted by Amar Kamat <am...@yahoo-inc.com>.
Can you provide more details on what exactly what you wish to do? What 
is the nature of reducers? A simple answer would be with map(m) 
and reducers(r1,r2) you can run 2 jobs i.e job1(m,r1) and 
job2(IdentityMapper,r2). But it depends what exactly r1 and r2 do. Also 
combiners will play an important role. Also can one merge r1 and r2 to r 
and run a job(m,r)
Amar
On Sat, 23 Feb 2008, ma qiang 
wrote:

> Hi all,
>    I have a program need to use two reduce fucntions, who can tell me why?
>    Thank you!
>
> Qiang
>

Re: how to use two reduce fucntions?

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Feb 22, 2008, at 11:34 PM, ma qiang wrote:

> Hi all,
>     I have a program need to use two reduce fucntions, who can tell  
> me why?
>     Thank you!

I assume you mean that your map functions decide which kind of reduce  
to send each key/value to. (The other possibility is that you want to  
have all of the key/value pairs sent to both kinds of reduces.)

You could if you have to, although I would think you are better off  
in most cases doing two jobs. (The map phase rarely dominates and  
reducing the amount of data in the shuffle would probably make things  
run faster.)

One approach would divide the reduces (in half?) and have an extra  
tag field in the key that says which kind of reduce to send it to.  
The partition function would then pick a reduce in the right range  
based on the tag. You would probably also want to have an output  
format that was aware of the split and wrote the outputs to a  
different place depending on the kind of reduce it is. So it isn't  
impossible, but it would be complicated...

-- Owen