You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2010/12/29 19:09:25 UTC

SkipBadRecords confusion

Some of my inputs fail deterministically and I would like to avoid trying them four times.  There seem to be two approachs, setMaxMapAttempts() and SkipBadRecords.  I'm trying to figure both of them out.  Currently, mapred.map.max.attempts is configured as final so I can't change it...so I'm trying to get SkipBadRecords to work.  I currently have this:

	SkipBadRecords.setMapperMaxSkipRecords(conf, 1);
	SkipBadRecords.setAttemptsToStartSkipping(conf, 1);

Note that I have set up my Hadoop job such that each input gets its own mapper, or put differently, each map task only has one input record to process, only one call to the map() method.  Therefore, I would expect the SkipBadRecords configuration above to force Hadoop to only attempt each input once since there is no "range" to narrow in on (what with there being  single input record)...but it seems to have no effect whatsoever.  Each mapper is still tried the original default four times.  It doesn't seem to detect and exclude the one bad record and bail on the rest of the task attempt (since there are no other inputs to process).

Any ideas why this is happening?  How can I get it to only try the input once and then give up.  These repeated attempts are holding up the reducer and therefore the entire job.

Thanks.

________________________________________________________________________________
Keith Wiley               kwiley@keithwiley.com               www.keithwiley.com

"Luminous beings are we, not this crude matter."
  -- Yoda
________________________________________________________________________________