You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Michael Stack <st...@archive.org> on 2006/04/12 03:51:42 UTC
'Corrupt GZIP trailer' and other irrecoverable IOEs.
I had a job fail in SequenceFile$Reader#next because a record was
corrupt. Every attempt at reading failed with 'Corrupt GZIP trailer'
(See stack trace on the end). Looks like IOEs of this type -- i.e.
irrecoverable IOEs -- need to have their maps rescheduled somehow. One
mechanism is to have this type of error report as an FSError. Then the
tasktracker will shut itself down take itself out of the pool of
machines. This is a little radical but should let the job complete.
But I suppose its kinda tough distingushing the 'irrecoverables' from
the 'recoverables' (e.g. timeouts).
I've started trying to keep tabs.
Another 'irrecoverable'-looking IOEs that I've seen is the following:
...
060404 220614 task_r_3dd1gh IOException null at 1294336. Skipping entries.
060404 220614 task_r_3dd1gh java.io.EOFException
060404 220614 task_r_3dd1gh at
java.io.DataInputStream.readFully(DataInputStream.java:178)
060404 220614 task_r_3dd1gh at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:56)
060404 220614 task_r_3dd1gh at
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:94)
060404 220614 task_r_3dd1gh at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:394)
060404 220614 task_r_3dd1gh at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:209)
060404 220614 task_r_3dd1gh at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709)
I've gotten this a good few times.
I recently added 'keep-going' code that is like the handleChecksumError
code in SequenceFile. It tries to just skip over problematic IOE
records/files. Its made it more likely jobs will complete on our
hardware but this is less than optimal for a couple of reasons: 1. It
will skip records w/ recoverable IOEs, and 2.) It breaks the mapreduce
'determinate' results property.
Thanks,
St.Ack
Here's the GZIP exception I mentioned up top:
060323 180210 task_r_ack5c7 0.853471% reduce > reduce
060323 180210 task_r_2u5g4x Error running child
060323 180210 task_r_g6d5ir 0.78022665% reduce > reduce
060323 180210 task_r_2u5g4x java.lang.RuntimeException:
java.io.IOException: Corrupt GZIP trailer
060323 180210 task_r_2u5g4x at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:132)
060323 180210 task_r_2u5g4x at
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41)
060323 180210 task_r_2u5g4x at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:283)
060323 180210 task_r_2u5g4x at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:703)
060323 180210 task_r_2u5g4x Caused by: java.io.IOException: Corrupt GZIP
trailer
060323 180210 task_r_2u5g4x at
java.util.zip.GZIPInputStream.readTrailer(GZIPInputStream.java:175)
060323 180210 task_r_2u5g4x at
java.util.zip.GZIPInputStream.read(GZIPInputStream.java:89)
060323 180210 task_r_2u5g4x at
org.apache.hadoop.io.WritableUtils.readCompressedByteArray(WritableUtils.java:35)
060323 180210 task_r_2u5g4x at
org.apache.hadoop.io.WritableUtils.readCompressedString(WritableUtils.java:70)
060323 180210 task_r_2u5g4x at
org.apache.nutch.parse.ParseText.readFields(ParseText.java:44)
060323 180210 task_r_2u5g4x at
org.apache.nutch.parse.ParseImpl.readFields(ParseImpl.java:59)
060323 180210 task_r_2u5g4x at
org.apache.nutch.parse.ParseImpl.read(ParseImpl.java:69)
060323 180210 task_r_2u5g4x at
org.apache.nutch.fetcher.FetcherOutput.readFields(FetcherOutput.java:47)
060323 180210 task_r_2u5g4x at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:344)
060323 180210 task_r_2u5g4x at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:163)
060323 180210 task_r_2u5g4x at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:129)
060323 180210 task_r_2u5g4x ... 3 more