You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Raymond Jennings III <ra...@yahoo.com> on 2010/06/02 21:52:31 UTC

Getting zero length files on the reduce output.

I have a cluster of 12 slave nodes.  I see that for some jobs the part-r-00000 type files, half of them are zero in size after the job completes.  Does this mean the hash function that splits the data to each reducer node is not working all that well?  On other jobs it's pretty much even across all reducers but on certain jobs only half of the reducers have files bigger than 0.  It is reproducible though.  Can I change this hash function in anyway?  Thanks.

Re: Getting zero length files on the reduce output.

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

Hi,
The default partitioner is - hashcode(key) MODULO number_of_reducers, so its pretty much possible.

>>Can I change this hash function in anyway?
Sure, any custom partitioner can be plugged in. Check o.a.h.mapreduce.partition or the secondary sort example on mapred tutorial for more.

On a side note, if you don't want the zero output files to come up, use lazyoutputformat instead.

Amogh

On 6/3/10 1:22 AM, "Raymond Jennings III" <ra...@yahoo.com> wrote:

I have a cluster of 12 slave nodes.  I see that for some jobs the part-r-00000 type files, half of them are zero in size after the job completes.  Does this mean the hash function that splits the data to each reducer node is not working all that well?  On other jobs it's pretty much even across all reducers but on certain jobs only half of the reducers have files bigger than 0.  It is reproducible though.  Can I change this hash function in anyway?  Thanks.