You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Runping Qi <ru...@yahoo-inc.com> on 2007/04/21 01:08:59 UTC

anybody use stream combiner feature?

 

The in current framework, each mapper task will create one combiner object
per partition per spill. 

This is very costly, since each time a combiner is created, a new process is
actually created to execute the 

combiner executable. I suspect a job with a stream combiner may not run much
faster than one without it.

It may even be slower. Thus, I doubt the value of supporting such a feature.


I want to know who use stream combiners in real applications and how they
use them. 

Whether these uses can be satisfied by the framework  providing a set of
generic combiners (such as Abacus)?

 

Thoughts?

 

Runping