You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jackob Carlsson <ja...@gmail.com> on 2010/08/02 17:39:10 UTC

Combiner function

Hi everyone,
Could anyone please help me to understand the function of combiner?

Thanks in advance
Jackob

Re: Combiner function

Posted by Jackob Carlsson <ja...@gmail.com>.
Thanks Edward.

> Is there a way we use it for several mappers as well?

> No. That is the exact opposite goal of the combiner. It runs locally.


OK, lets say a stupid scenario, when for instance one mapper is late to
produce the results and it cause a waiting for a reducer task. Then, how to
optimize this case?



> >it may or may not run on a particular map attempt
> It only runs when certain thresholds in the framework are reached.
>
> http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/
>


What are these thresholds that may or may not run on a particular map
attempt?

Re: Combiner function

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Aug 2, 2010 at 4:28 PM, Jackob Carlsson
<ja...@gmail.com> wrote:
> Thanks Nick, but "in-memory" means a combiner can only be used over a single
> mapper?right?! Is there a way we use it for several mappers as well? Also
> what do you mean by "it may or may not run on a particular map attempt"?
>
> Br,
> Jackob
>
> On Mon, Aug 2, 2010 at 5:43 PM, Nick Jones <ni...@amd.com> wrote:
>
>> Hi Jackob,
>> A combiner acts a lot like a reduce step but it's executed on the mapper
>> with in-memory data.  I've seen a reduction in job execution time by adding
>> one.  The one caveat to keep in mind is that it may or may not run on a
>> particular map attempt.
>>
>> Nick
>>
>>
>>
>> On 8/2/2010 10:39 AM, Jackob Carlsson wrote:
>>
>>> Hi everyone,
>>> Could anyone please help me to understand the function of combiner?
>>>
>>> Thanks in advance
>>> Jackob
>>>
>>>
>>
>>
>

> Is there a way we use it for several mappers as well?
No. That is the exact opposite goal of the combiner. It runs locally.
>it may or may not run on a particular map attempt
It only runs when certain thresholds in the framework are reached.

http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/

Re: Combiner function

Posted by Jackob Carlsson <ja...@gmail.com>.
Thanks Nick, but "in-memory" means a combiner can only be used over a single
mapper?right?! Is there a way we use it for several mappers as well? Also
what do you mean by "it may or may not run on a particular map attempt"?

Br,
Jackob

On Mon, Aug 2, 2010 at 5:43 PM, Nick Jones <ni...@amd.com> wrote:

> Hi Jackob,
> A combiner acts a lot like a reduce step but it's executed on the mapper
> with in-memory data.  I've seen a reduction in job execution time by adding
> one.  The one caveat to keep in mind is that it may or may not run on a
> particular map attempt.
>
> Nick
>
>
>
> On 8/2/2010 10:39 AM, Jackob Carlsson wrote:
>
>> Hi everyone,
>> Could anyone please help me to understand the function of combiner?
>>
>> Thanks in advance
>> Jackob
>>
>>
>
>

Re: Combiner function

Posted by Nick Jones <ni...@amd.com>.
Hi Jackob,
A combiner acts a lot like a reduce step but it's executed on the mapper 
with in-memory data.  I've seen a reduction in job execution time by 
adding one.  The one caveat to keep in mind is that it may or may not 
run on a particular map attempt.

Nick


On 8/2/2010 10:39 AM, Jackob Carlsson wrote:
> Hi everyone,
> Could anyone please help me to understand the function of combiner?
>
> Thanks in advance
> Jackob
>    


Re: Combiner function

Posted by zaki rahaman <za...@gmail.com>.
>From the Wiki: http://wiki.apache.org/hadoop/HadoopMapReduce

In simple cases, your combiner may simply be your reduce function/code
applied to your map output before it's shuffled, sorted, and available for
reduce tasks. (This is often the case with counting/simple aggregation).

On Mon, Aug 2, 2010 at 11:39 AM, Jackob Carlsson
<ja...@gmail.com>wrote:

> Hi everyone,
> Could anyone please help me to understand the function of combiner?
>
> Thanks in advance
> Jackob
>



-- 
Zaki Rahaman

Re: Combiner function

Posted by Harsh J <qw...@gmail.com>.
As others have pointed out, its mostly applied as an optimization
step. In most cases one's 'Mapper' outputs carry at least a small
group of similar keys that go on to the reducer after a copy and a
sort phase. To reduce it locally (in-memory) via a 'Combiner' helps
reduce data in the copy-sort stages until the 'Reducer' operation
kicks-in.

Do note that, implementation-wise, a 'combiner' class must always
collect the same key-value pair types as the mapper function.

On Mon, Aug 2, 2010 at 9:09 PM, Jackob Carlsson
<ja...@gmail.com> wrote:
> Hi everyone,
> Could anyone please help me to understand the function of combiner?
>
> Thanks in advance
> Jackob
>



-- 
Harsh J
www.harshj.com