You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by 柳松 <la...@126.com> on 2009/03/12 10:15:52 UTC

How to skip bad records in .19.1


Dear all:
    I have set the value "SkipBadRecords.setMapperMaxSkipRecords(conf, 1)",
and also the "SkipBadRecords.setAttemptsToStartSkipping(conf, 2)".
 
    However, after 3 failed attempts, it gave me this exception message:
 
   java.lang.NullPointerException
 at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
 at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
 at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.(SequenceFile.java:1198)
 at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:401)
 at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:306)
 at org.apache.hadoop.mapred.MapTask$SkippingRecordReader.writeSkippedRec(MapTask.java:265)
 at org.apache.hadoop.mapred.MapTask$SkippingRecordReader.next(MapTask.java:237)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.Child.main(Child.java:158)
 
   The last line of  syslog shows:
   2009-03-12 16:44:11,218 WARN org.apache.hadoop.mapred.SortedRanges: Skipping index 1-2
 
   I have two questions: 
   1. Should it skip the bad record automatically after 2 attempts? why it starts after 3?
 
   2. Why does the skip fail?
 
Regards
Song Liu from Suzhou University