You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/18 22:31:10 UTC
[jira] [Resolved] (MAPREDUCE-255) avoid bzip2 decompressor throwing
exception on corrupted (prematurely truncated) file
[ https://issues.apache.org/jira/browse/MAPREDUCE-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved MAPREDUCE-255.
----------------------------------------
Resolution: Incomplete
Closing this as stale.
> avoid bzip2 decompressor throwing exception on corrupted (prematurely truncated) file
> -------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-255
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-255
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Suhas Gogate
>
> running map-reduce streaming job using the bzip2 compressor, job fails with one of either of the two following java exceptions:
> This seems to happen when one of the bz2 input files is corrupted (probably when the file is prematurely truncated). Example,
> Can we fix the bzip2 decompresser so that it does not throw the above two exceptions?
> 2008-07-16 07:23:39,605 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
> java.io.IOException: mark/reset not supported
> at java.io.InputStream.reset(InputStream.java:334)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.readLine(Bzip2TextInputFormat.java:117)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.next(Bzip2TextInputFormat.java:140)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.next(Bzip2TextInputFormat.java:34)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> or
> 2008-07-16 20:49:28,020 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
> java.io.IOException: CRC error
> at
> org.apache.tools.bzip2r.CBZip2InputStream.cadvise(CBZip2InputStream.java:74)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.crcError(CBZip2InputStream.java:378)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.endBlock(CBZip2InputStream.java:351)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.setupNoRandPartA(CBZip2InputStream.java:851)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.setupNoRandPartB(CBZip2InputStream.java:903)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.read(CBZip2InputStream.java:240)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.readLine(Bzip2TextInputFormat.java:102)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.next(Bzip2TextInputFormat.java:140)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.next(Bzip2TextInputFormat.java:34)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> Example:
> $HADOOP_HOME/bin/hadoop jar -libjars $<path>/jars/bzip2.jar
> $HADOOP_HOME/hadoop-streaming.jar \
> -inputformat org.apache.hadoop.mapred.Bzip2TextInputFormat \
> -mapper "cat" \
> -reducer "cat" \
> -numReduceTasks 20 \
> -input '<path>/corrupt-data.bz2' \
> -output bzip2_bug_example \
> -jobconf stream.num.map.output.key.fields=1 \
> -jobconf stream.num.reduce.output.fields=1 \
> -jobconf num.key.fields.for.partition=1
--
This message was sent by Atlassian JIRA
(v6.2#6252)