You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by 柳松 <la...@126.com> on 2009/03/12 10:15:52 UTC
How to skip bad records in .19.1
Dear all:
I have set the value "SkipBadRecords.setMapperMaxSkipRecords(conf, 1)",
and also the "SkipBadRecords.setAttemptsToStartSkipping(conf, 2)".
However, after 3 failed attempts, it gave me this exception message:
java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.(SequenceFile.java:1198)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:306)
at org.apache.hadoop.mapred.MapTask$SkippingRecordReader.writeSkippedRec(MapTask.java:265)
at org.apache.hadoop.mapred.MapTask$SkippingRecordReader.next(MapTask.java:237)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
The last line of syslog shows:
2009-03-12 16:44:11,218 WARN org.apache.hadoop.mapred.SortedRanges: Skipping index 1-2
I have two questions:
1. Should it skip the bad record automatically after 2 attempts? why it starts after 3?
2. Why does the skip fail?
Regards
Song Liu from Suzhou University