You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Runping Qi <ru...@yahoo-inc.com> on 2007/04/21 01:08:59 UTC
anybody use stream combiner feature?
The in current framework, each mapper task will create one combiner object
per partition per spill.
This is very costly, since each time a combiner is created, a new process is
actually created to execute the
combiner executable. I suspect a job with a stream combiner may not run much
faster than one without it.
It may even be slower. Thus, I doubt the value of supporting such a feature.
I want to know who use stream combiners in real applications and how they
use them.
Whether these uses can be satisfied by the framework providing a set of
generic combiners (such as Abacus)?
Thoughts?
Runping