You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Tatu Saloranta (JIRA)" <ji...@apache.org> on 2009/11/24 08:36:39 UTC

[jira] Commented: (HADOOP-4874) Remove bindings to lzo

    [ https://issues.apache.org/jira/browse/HADOOP-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781810#action_12781810 ] 

Tatu Saloranta commented on HADOOP-4874:
----------------------------------------

Actually, I only now had time to spend on this: and ended up testing LZF (http://oldhome.schmorp.de/marc/liblzf.html), ported by H2 team (http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/).
Turns out LZF is pretty good at speed, although one has to be careful with choosing good buffer sizes, hash table size, and ideally reuse buffers too if possible. If so, it can be bit faster on decompression, and a lot faster on compression.
Numbers I saw (this is just initial testing) indicated up to twice as fast compression, and maybe 30% faster decompress.
Compression ratio is not as good; whereas gzip would give raties of 81/93/97% (for content size of 2k/20k/200k), LZF would give 66/72/72% (ie. compresses down to 34/28/28% of original). Which is still pretty good of course.
These with JSON data.

LZF is block-based algorithm just like all others, including gzip, and is about as easy to wrap in input/output streams.

I hope to find time to actually wrap existing code into bit better packaging (wrt buffer reuse and other optimizations). If so, it could be a reusable component. That may take some time, but in the meantime, source link above allows others to try out code as well if they want to.


> Remove bindings to lzo
> ----------------------
>
>                 Key: HADOOP-4874
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4874
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: h4874.patch
>
>
> It looks like the lzo bindings are infected by lzo's GPL and must be removed from Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.