You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Don Wallwork <do...@yahoo.com> on 2014/10/07 21:52:15 UTC

bzip2 input decompression not using native library

Can someone tell me why native bzip2 de/compression works in hadoop 2.4.1 for 
map output compression, but the java bzip2 implementation is used for input file 
decompression?  Is this expected?

While profiling some hadoop wordcount jobs using a bzip2 compressed input file, it 
looks like bzip2 decompression is using the java implementation rather than the native 
library for input file decompression.  Output from the linux perf tool (see below), shows 
that the java bzip2 implementation is used.

     1.83%           java  perf-12473.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     1.42%           java  perf-11567.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     1.16%           java  perf-12473.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V
     1.05%           java  perf-12174.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     0.99%           java  perf-11770.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     0.98%           java  perf-12826.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     0.89%           java  perf-12174.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V
     0.79%           java  perf-12739.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     0.79%           java  perf-12544.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I

When using the perf tool to check map output compression, it shows that the library version 
is correctly used.

This cluster is running Apache Hadoop version 2.4.1 which has been compiled from source 
to include native compression libraries for bzip2 et al on 64 bit ubuntu 12.04.  Checknative 
shows that the native compression libraries should be used:

hadoop checknative -a
14/10/07 15:15:57 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
14/10/07 15:15:57 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/local/hadoop-local-build/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0
zlib:   true /lib/x86_64-linux-gnu/libz.so.1
snappy: true /usr/lib/libsnappy.so.1
lz4:    true revision:99
bzip2:  true /lib/x86_64-linux-gnu/libbz2.so.1

I have verified that the io.compression.codec.bzip2.library configuration uses the default
system-native.

Thanks,

Don