You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eran Kutner <er...@gigya.com> on 2010/08/02 12:25:29 UTC

Which LZO library to use?

Hi,
I want to enable LZO compression on my cluster but see there are a few
alternatives and the wiki page itself is very confusing so it's not clear
what is the right choice. I was looking at this page:
http://wiki.apache.org/hadoop/UsingLzoCompression, at the top it recommends
using Kevin Weil's version (which seems to be the same one released by
Twitter) but warns it doesn't contain all fixes and lower in the article it
refers to the original google code repository (
http://code.google.com/p/hadoop-gpl-compression/).
The thing the concerns me most is future compatibility, so whichever library
I pick now I want to be certain my data compressed will still be readable
when I I upgrade to the next major version of Hadoop and Hbase. It seems
that only the Google code project has newer releases compatible with future
version of Hadoop.

So I'm looking for recommendations on which library to use.


Thanks,
Eran

Re: Which LZO library to use?

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, Aug 2, 2010 at 11:12 AM, Alex Kozlov <al...@cloudera.com> wrote:

> The code is currently maintained by Kevin Weil and Todd Lipcon.  For
> completeness, there is one more distribution at
> http://github.com/toddlipcon/hadoop-lzo.  AFAIK, the Todd Lipcon's and
> Kevin
> Weil's distribution are synced.
>

Yep, Kevin and I pull from each other whenever we make any changes. So
sometimes a bug fix will hit one of our repos a day or two before the other,
but we keep in touch and should never diverge.



>
> Most of the differences with google's code are bug fixes: the lzo file
> format itself had not changed and you can actually read the files created
> with lzop (the LZO command line tool).   Their are no version compatibility
> issues currently.
>
> Alex K
>
> On Mon, Aug 2, 2010 at 3:25 AM, Eran Kutner <er...@gigya.com> wrote:
>
> > Hi,
> > I want to enable LZO compression on my cluster but see there are a few
> > alternatives and the wiki page itself is very confusing so it's not clear
> > what is the right choice. I was looking at this page:
> > http://wiki.apache.org/hadoop/UsingLzoCompression, at the top it
> > recommends
> > using Kevin Weil's version (which seems to be the same one released by
> > Twitter) but warns it doesn't contain all fixes and lower in the article
> it
> > refers to the original google code repository (
> > http://code.google.com/p/hadoop-gpl-compression/).
> > The thing the concerns me most is future compatibility, so whichever
> > library
> > I pick now I want to be certain my data compressed will still be readable
> > when I I upgrade to the next major version of Hadoop and Hbase. It seems
> > that only the Google code project has newer releases compatible with
> future
> > version of Hadoop.
> >
> > So I'm looking for recommendations on which library to use.
> >
> >
> > Thanks,
> > Eran
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Which LZO library to use?

Posted by Eran Kutner <er...@gigya.com>.
Thanks. Now I feel much more comfortable using Kevin's code.



On Mon, Aug 2, 2010 at 21:12, Alex Kozlov <al...@cloudera.com> wrote:

> The code is currently maintained by Kevin Weil and Todd Lipcon.  For
> completeness, there is one more distribution at
> http://github.com/toddlipcon/hadoop-lzo.  AFAIK, the Todd Lipcon's and
> Kevin
> Weil's distribution are synced.
>
> Most of the differences with google's code are bug fixes: the lzo file
> format itself had not changed and you can actually read the files created
> with lzop (the LZO command line tool).   Their are no version compatibility
> issues currently.
>
> Alex K
>
> On Mon, Aug 2, 2010 at 3:25 AM, Eran Kutner <er...@gigya.com> wrote:
>
> > Hi,
> > I want to enable LZO compression on my cluster but see there are a few
> > alternatives and the wiki page itself is very confusing so it's not clear
> > what is the right choice. I was looking at this page:
> > http://wiki.apache.org/hadoop/UsingLzoCompression, at the top it
> > recommends
> > using Kevin Weil's version (which seems to be the same one released by
> > Twitter) but warns it doesn't contain all fixes and lower in the article
> it
> > refers to the original google code repository (
> > http://code.google.com/p/hadoop-gpl-compression/).
> > The thing the concerns me most is future compatibility, so whichever
> > library
> > I pick now I want to be certain my data compressed will still be readable
> > when I I upgrade to the next major version of Hadoop and Hbase. It seems
> > that only the Google code project has newer releases compatible with
> future
> > version of Hadoop.
> >
> > So I'm looking for recommendations on which library to use.
> >
> >
> > Thanks,
> > Eran
> >
>

Re: Which LZO library to use?

Posted by Alex Kozlov <al...@cloudera.com>.
The code is currently maintained by Kevin Weil and Todd Lipcon.  For
completeness, there is one more distribution at
http://github.com/toddlipcon/hadoop-lzo.  AFAIK, the Todd Lipcon's and Kevin
Weil's distribution are synced.

Most of the differences with google's code are bug fixes: the lzo file
format itself had not changed and you can actually read the files created
with lzop (the LZO command line tool).   Their are no version compatibility
issues currently.

Alex K

On Mon, Aug 2, 2010 at 3:25 AM, Eran Kutner <er...@gigya.com> wrote:

> Hi,
> I want to enable LZO compression on my cluster but see there are a few
> alternatives and the wiki page itself is very confusing so it's not clear
> what is the right choice. I was looking at this page:
> http://wiki.apache.org/hadoop/UsingLzoCompression, at the top it
> recommends
> using Kevin Weil's version (which seems to be the same one released by
> Twitter) but warns it doesn't contain all fixes and lower in the article it
> refers to the original google code repository (
> http://code.google.com/p/hadoop-gpl-compression/).
> The thing the concerns me most is future compatibility, so whichever
> library
> I pick now I want to be certain my data compressed will still be readable
> when I I upgrade to the next major version of Hadoop and Hbase. It seems
> that only the Google code project has newer releases compatible with future
> version of Hadoop.
>
> So I'm looking for recommendations on which library to use.
>
>
> Thanks,
> Eran
>

Re: Which LZO library to use?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I think that the person who wrote the header of that page meant that
the hadoop-gpl-compression project lacks fixes included in Kevin's
repo. AFAIK you can hit those if you use LZOed files as input for MR,
but I've been using the second one for more than a year without any
issue (in HBase).

J-D

On Mon, Aug 2, 2010 at 3:25 AM, Eran Kutner <er...@gigya.com> wrote:
> Hi,
> I want to enable LZO compression on my cluster but see there are a few
> alternatives and the wiki page itself is very confusing so it's not clear
> what is the right choice. I was looking at this page:
> http://wiki.apache.org/hadoop/UsingLzoCompression, at the top it recommends
> using Kevin Weil's version (which seems to be the same one released by
> Twitter) but warns it doesn't contain all fixes and lower in the article it
> refers to the original google code repository (
> http://code.google.com/p/hadoop-gpl-compression/).
> The thing the concerns me most is future compatibility, so whichever library
> I pick now I want to be certain my data compressed will still be readable
> when I I upgrade to the next major version of Hadoop and Hbase. It seems
> that only the Google code project has newer releases compatible with future
> version of Hadoop.
>
> So I'm looking for recommendations on which library to use.
>
>
> Thanks,
> Eran
>