You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Guo Ruijing (JIRA)" <ji...@apache.org> on 2013/12/31 07:07:50 UTC

[jira] [Created] (HADOOP-10196) Bzip2Codec Uncompress cannot work

Guo Ruijing created HADOOP-10196:
------------------------------------

             Summary: Bzip2Codec Uncompress cannot work
                 Key: HADOOP-10196
                 URL: https://issues.apache.org/jira/browse/HADOOP-10196
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 2.2.0
            Reporter: Guo Ruijing


Bzip2Codec Uncompress cannot work.

1. Compress Sample file:

[hadoop@localhost ~]$ cat StreamCompressor.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ReflectionUtils;

public class StreamCompressor {

public static void main(String[] args) throws Exception
{ String codecClassname = args[0]; Class<?> codecClass = Class.forName(codecClassname); Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf); CompressionOutputStream out = codec.createOutputStream(System.out); IOUtils.copyBytes(System.in, out, 4096, false); out.finish(); }

}

2. Uncompress Sample file:

[hadoop@localhost ~]$ cat StreamUncompressor.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.CompressionInputStream;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ReflectionUtils;

public class StreamUncompressor {

public static void main(String[] args) throws Exception
{ String codecClassname = args[0]; Class<?> codecClass = Class.forName(codecClassname); Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf); CompressionInputStream in = codec.createInputStream(System.in); IOUtils.copyBytes(in, System.out, 4096, false); in.close(); }

}

2. How to compile/run

1) javac -classpath /usr/lib/gphd/hadoop/hadoop-common-2.0.5-alpha-gphd-2.1.1.0.jar StreamCompressor.java

2) javac -classpath /usr/lib/gphd/hadoop/hadoop-common-2.0.5-alpha-gphd-2.1.1.0.jar StreamUncompressor.java

3) jar -cvf Stream.jar StreamCompressor.class StreamUncompressor.class

4) rm -rf /tmp/my.txt.bz2 && echo abc > /tmp/my.txt && bzip2 /tmp/my.txt && cat /tmp/my.txt.bz2 | hadoop jar ./Stream.jar StreamUncompressor org.apache.hadoop.io.compress.BZip2Codec

5) echo "text" | hadoop jar ./Stream.jar StreamCompressor org.apache.hadoop.io.compress.BZip2Codec | bzcat

3. Test Result
>From test, hadoop doesn't support native bzip2 and java bzip2.

1) hadoop support bzip2 uncompress.

rm -rf /tmp/my.txt.bz2 && echo abc > /tmp/my.txt && bzip2 /tmp/my.txt && cat /tmp/my.txt.bz2 | hadoop jar ./Stream.jar StreamUncompressor org.apache.hadoop.io.compress.BZip2Codec
13/12/17 03:58:20 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
abc <<< expect

2) bzip2 compress cannot work as following:

a) [hadoop@localhost hadoop]$ echo "text" | hadoop jar ./Stream.jar StreamCompressor org.apache.hadoop.io.compress.BZip2Codec
13/12/17 04:00:59 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
BZ <<<<< not expect

b) [hadoop@localhost hadoop]$ echo "text" | hadoop jar ./Stream.jar StreamCompressor org.apache.hadoop.io.compress.BZip2Codec | bzcat
13/12/17 04:01:31 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version

bzcat: Compressed file ends unexpectedly;
perhaps it is corrupted? Possible reason follows.
bzcat: Invalid argument
Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)