You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Joel Welling <we...@psc.edu> on 2008/09/10 19:32:12 UTC

Ordering of records in output files?

Hi folks;
  I have a simple Streaming job where the mapper produces output records
beginning with a 16 character ascii string and passes them to
IdentityReducer.  When I run it, I get the same number of output files
as I have mapred.reduce.tasks .  Each one contains some of the strings,
and within each file the strings are in sorted order.
  But there is no obvious ordering *across* the files.  For example, I
can see where the first few strings in the output went to files 0,1,3,4,
and then back to 0, but none of them ended up in file 2.
  What's the algorithm that determines which strings end up in which
files?  Is there a way I can change it so that sequentially ordered
strings end up in the same file rather than spraying off across all the
files?

Thanks,
-Joel
 welling@psc.edu