You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2008/08/04 22:56:44 UTC
[jira] Commented: (HADOOP-3898) avoid bzip2 decompressor throwing
exception on corrupted (prematurely truncated) file
[ https://issues.apache.org/jira/browse/HADOOP-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619701#action_12619701 ]
Chris Douglas commented on HADOOP-3898:
---------------------------------------
The bzip2 compression codec is slated for 0.19 (current trunk), not 0.17. Does this problem exist in trunk, or only in the 0.17.1+ version you're running?
> avoid bzip2 decompressor throwing exception on corrupted (prematurely truncated) file
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-3898
> URL: https://issues.apache.org/jira/browse/HADOOP-3898
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.17.1
> Reporter: Suhas Gogate
>
> running map-reduce streaming job using the bzip2 compressor, job fails with one of either of the two following java exceptions:
> This seems to happen when one of the bz2 input files is corrupted (probably when the file is prematurely truncated). Example,
> Can we fix the bzip2 decompresser so that it does not throw the above two exceptions?
> 2008-07-16 07:23:39,605 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
> java.io.IOException: mark/reset not supported
> at java.io.InputStream.reset(InputStream.java:334)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.readLine(Bzip2TextInputFormat.java:117)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.next(Bzip2TextInputFormat.java:140)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.next(Bzip2TextInputFormat.java:34)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> or
> 2008-07-16 20:49:28,020 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
> java.io.IOException: CRC error
> at
> org.apache.tools.bzip2r.CBZip2InputStream.cadvise(CBZip2InputStream.java:74)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.crcError(CBZip2InputStream.java:378)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.endBlock(CBZip2InputStream.java:351)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.setupNoRandPartA(CBZip2InputStream.java:851)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.setupNoRandPartB(CBZip2InputStream.java:903)
> at
> org.apache.tools.bzip2r.CBZip2InputStream.read(CBZip2InputStream.java:240)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.readLine(Bzip2TextInputFormat.java:102)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.next(Bzip2TextInputFormat.java:140)
> at
> org.apache.hadoop.mapred.Bzip2TextInputFormat$BZip2LineRecordReader.next(Bzip2TextInputFormat.java:34)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:158)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> Example:
> $HADOOP_HOME/bin/hadoop jar -libjars $<path>/jars/bzip2.jar
> $HADOOP_HOME/hadoop-streaming.jar \
> -inputformat org.apache.hadoop.mapred.Bzip2TextInputFormat \
> -mapper "cat" \
> -reducer "cat" \
> -numReduceTasks 20 \
> -input '<path>/corrupt-data.bz2' \
> -output bzip2_bug_example \
> -jobconf stream.num.map.output.key.fields=1 \
> -jobconf stream.num.reduce.output.fields=1 \
> -jobconf num.key.fields.for.partition=1
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.