You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/08/17 01:37:03 UTC

[Lucene-hadoop Wiki] Update of "HowManyMapsAndReduces" by LohitVijayarenu

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by LohitVijayarenu:
http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces

------------------------------------------------------------------------------
  
  == Number of Reduces ==
  
- The right number of reduces seems to be between 1.0 to 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 1.0 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.
+ The right number of reduces seems 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.
  
  Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files (io.buffer.size * 2 * numReduces << heapSize). This will be fixed at some point, but until it is it provides a pretty firm upper bound.