You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Luke Lu (JIRA)" <ji...@apache.org> on 2013/08/05 23:16:49 UTC

[jira] [Resolved] (HADOOP-9785) LZ4 code may need upgrade (lz4.c embedded in libHadoop is r43 18 months ago, while latest version is r98)

     [ https://issues.apache.org/jira/browse/HADOOP-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luke Lu resolved HADOOP-9785.
-----------------------------

       Resolution: Duplicate
    Fix Version/s:     (was: 2.0.4-alpha)
                       (was: 3.0.0)
                   2.3.0
    
> LZ4 code may need upgrade (lz4.c embedded in libHadoop is r43 18 months ago, while latest version is r98)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9785
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io, native
>    Affects Versions: 3.0.0, 2.0.4-alpha
>         Environment: [german@localhost lz4-read-only]$ lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                4
> On-line CPU(s) list:   0-3
> Thread(s) per core:    1
> Core(s) per socket:    4
> Socket(s):             1
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 23
> Stepping:              10
> CPU MHz:               2667.000
> BogoMIPS:              5319.82
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              2048K
> NUMA node0 CPU(s):     0-3
> [german@localhost lz4-read-only]$ uname -r
> 2.6.32-358.14.1.el6.x86_64
>            Reporter: German Florez-Larrahondo
>            Priority: Minor
>             Fix For: 2.3.0
>
>
> While analyzing compression performance of different Hadoop codecs I noticed that the LZ4 code was taken from revision 43 of https://code.google.com/p/lz4/. The latest version is r98 and there may be extra performance benefits we can gain from using r98. 
> We may involve the original LZ4 author Yann Collet on these discussions, as the current LZ4 code includes additional algorithms and parameters. 
> To start the investigation, I ran preliminary experiments with the Silesia corpus and there seems to be an improvement on throughput for compression and decompression in the latest release when compared with r43 (haven't done enough analysis to conclude anything statistically, but looks good).  
> Here is raw output using LZ4 from r43 with a SUBSET of the silesia corpus (http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia)
> File: silesia/dickens
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 10192446 bytes into 6433123 bytes ==> 63.12%
> Done in 0.07 s ==> 138.86 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 10192446 bytes
> Done in 0.02 s ==> 486.01 MB/s
> File: silesia/mozilla
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 51220480 bytes into 26379814 bytes ==> 51.50%
> Done in 0.25 s ==> 195.39 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 51220480 bytes
> Done in 0.12 s ==> 407.06 MB/s
> File: silesia/mr
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 9970564 bytes into 5669268 bytes ==> 56.86%
> Done in 0.04 s ==> 237.72 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 9970564 bytes
> Done in 0.02 s ==> 475.43 MB/s
> File: silesia/nci
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 33553445 bytes into 5880292 bytes ==> 17.53%
> Done in 0.08 s ==> 399.99 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 33553445 bytes
> Done in 0.06 s ==> 533.32 MB/s
> And here raw output of LZ4 from the latest release r98
> File: silesia/dickens
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/dickens...
> 1-LZ4_compress        :  10192446 ->^M1-LZ4_compress        :  10192446 ->   6434313 (63.13%),  172.3 MB/s
> 1-LZ4_decompress_fast :  10192446 ->^M1-LZ4_decompress_fast :  10192446 ->   676.0 MB/s^MLZ4_decompress_fast   :  10192446 ->   676.0 MB/s
> File: silesia/mozilla
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/mozilla...
> 1-LZ4_compress        :  51220480 ->^M1-LZ4_compress        :  51220480 ->  26382113 (51.51%),  281.7 MB/s
> 1-LZ4_decompress_fast :  51220480 ->^M1-LZ4_decompress_fast :  51220480 ->  1003.1 MB/s^MLZ4_decompress_fast   :  51220480 ->  1003.1 MB/s
> File: silesia/mr
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/mr...
> 1-LZ4_compress        :   9970564 ->^M1-LZ4_compress        :   9970564 ->   5669255 (56.86%),  268.3 MB/s
> 1-LZ4_decompress_fast :   9970564 ->^M1-LZ4_decompress_fast :   9970564 ->   788.7 MB/s^MLZ4_decompress_fast   :   9970564 ->   788.7 MB/s
> File: silesia/nci
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/nci...
> 1-LZ4_compress        :  33553445 ->^M1-LZ4_compress        :  33553445 ->   5883923 (17.54%),  584.9 MB
> 1-LZ4_decompress_fast :  33553445 ->^M1-LZ4_decompress_fast :  33553445 ->  1208.3 MB/s^MLZ4_decompress_fast   :  33553445 ->  1208.3 MB/s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira