You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Yağız Kargın <xe...@gmail.com> on 2010/08/31 13:57:29 UTC

Combining Only Once?

Hi All,

Is there a way that we can make combiner run only once after each map
task, as in the old versions of Hadoop?

We found out a way, described in the book, J. Lin and C. Dyer.
Data-Intensive Text Processing with MapReduce. Morgan & Claypool
Publishers, 2010: In-mapper combining. However it's memory consumption
becomes too much, when you have large keys.

I would be glad if anyone has a way to do it.


Best,
Yagiz Kargin

Re: Combining Only Once?

Posted by Yağız Kargın <xe...@gmail.com>.
Thanks for the reply.

2010/8/31 Owen O'Malley <om...@apache.org>:
> There used to be a compatibility switch, but I believe it was removed
> in 0.19 or 0.20.

I recognized that, the switch has already been removed.

>
> Can you describe what you are trying to accomplish? Combiners were
> always intended to only be used for  operations that are idempotent,
> associative, and commutative. Clearly your combiner doesn't satisfy
> one of those properties or you wouldn't care if it was applied more
> than once.

Actually, I have to apply an operation on the final output of each map
task, but only once. For each map task, for each key, I see
fully-aggregated final value and then reduce the map output size by a
large amount according to the values. Basically that is something
people usually do in the reduce phase. However, since my keys are
large and many for each mapper; I want to lower the network cost, by
pre-removing keys which I don't need in the final output. This can be
done, only if I can reach the locally aggregated final output of the
map tasks in the map phase.

Yagiz

>
> -- Owen
>

Re: Combining Only Once?

Posted by Owen O'Malley <om...@apache.org>.
There used to be a compatibility switch, but I believe it was removed
in 0.19 or 0.20.

Can you describe what you are trying to accomplish? Combiners were
always intended to only be used for  operations that are idempotent,
associative, and commutative. Clearly your combiner doesn't satisfy
one of those properties or you wouldn't care if it was applied more
than once.

-- Owen