You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Don Wallwork <do...@yahoo.com> on 2014/10/07 21:52:15 UTC
bzip2 input decompression not using native library
Can someone tell me why native bzip2 de/compression works in hadoop 2.4.1 for
map output compression, but the java bzip2 implementation is used for input file
decompression? Is this expected?
While profiling some hadoop wordcount jobs using a bzip2 compressed input file, it
looks like bzip2 decompression is using the java implementation rather than the native
library for input file decompression. Output from the linux perf tool (see below), shows
that the java bzip2 implementation is used.
1.83% java perf-12473.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
1.42% java perf-11567.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
1.16% java perf-12473.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V
1.05% java perf-12174.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
0.99% java perf-11770.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
0.98% java perf-12826.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
0.89% java perf-12174.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V
0.79% java perf-12739.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
0.79% java perf-12544.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
When using the perf tool to check map output compression, it shows that the library version
is correctly used.
This cluster is running Apache Hadoop version 2.4.1 which has been compiled from source
to include native compression libraries for bzip2 et al on 64 bit ubuntu 12.04. Checknative
shows that the native compression libraries should be used:
hadoop checknative -a
14/10/07 15:15:57 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
14/10/07 15:15:57 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/local/hadoop-local-build/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0
zlib: true /lib/x86_64-linux-gnu/libz.so.1
snappy: true /usr/lib/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib/x86_64-linux-gnu/libbz2.so.1
I have verified that the io.compression.codec.bzip2.library configuration uses the default
system-native.
Thanks,
Don