You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mike Spreitzer <ms...@us.ibm.com> on 2011/10/13 19:12:01 UTC
Looking for stragglers in iterated map-reduce
In iterated map-reduce, a series of code-identical jobs where the reduce
output of one is the map input of the next, there are two synchronization
barriers per iteration: one in the middle of each job (between map and
reduce) and one at the end of each job. In principle this could be a
painfully excessive amount of synchronization. Is it in practice? Do you
have iterated map-reduce applications with great load imbalance in some
phases?
Thanks,
Mike