You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ajay Srivastava <Aj...@guavus.com> on 2013/01/23 11:34:44 UTC

Spilled records

Hi,

I was tuning mapred job to reduce number of spills and reached a stage where following numbers are same -

Spilled Records in map = Spilled records in reduce = Combine output Records = Reduce Input Records


I do not see any lines in mapper logs with following strings -
1. Spilling map output: record full
2. Spilling map output: buffer full

Only these strings -
1. Finished spill 0 ( Note 0 at the end )

I am confused and can someone please explain what's going on ?

1. Though neither buffer nor record got full yet there are spills ? Is it that mapper writing records at the end to be consumed by reducer that's why I see these spills ?
2. Why is combiner running if there were no spills ? If my guess is correct in point 1 then, will combiner not run if number of mappers < min.num.spills.for.combine ?
3. Why spills are counted in reducer stats ?
4. Is there way that I can tell mapper not to write final output to disk and reducers fetch the data from mapper's main memory ?



Regards,
Ajay Srivastava