You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Nick Jones <ni...@amd.com> on 2010/03/10 15:07:21 UTC

Uneven DBInputFormat Splits

Hi all,
I've setup a job that pulls say 250 records from MySQL and splits them 
across several mappers.  Each mapper (with the exception of attempt_*_0) 
gets roughly 250/(n mappers) records.  However, attempt 0 always ends up 
with ~5x the workload of the others.  Is there something I'm missing or 
is this normal?

Thanks

Nick Jones